So you want to build AI agent group chat?
On Nov 13th Open AI announced the pilot of group chats in ChatGPT. This post looks at the existing patterns for interacting with models, and how they make it hard to build similar features.
Disclaimer, I work for Ably; so I’m intimately familiar with the tech I mention here. Opinions are my own, etc.
Existing patterns
If you look at the dominant AI and Agentic frameworks you’ll see the same pattern. Across Vercel AI SDK, Google ADK, Langchain, Mastra, Anthropic, and OpenAI SDKs; they all use the same dominant pattern for delivering model responses to clients.
The pattern relies on streaming responses as Server-Sent Events over long-lived HTTP connections. HTTP post-and-wait.
- A client POSTs a prompt to the server, agent, or model and leaves the connection open.
- The model does some work, generates a response, which using the streaming APIs and typically come as tokens/fragments/deltas.
- The server delivers each of these response fragments over the open HTTP connection.
- The client constructs the full streamed response by combining the fragments.
But actually building with this pattern sucks. If the client’s connection drops, the model doesn’t have anywhere to deliver the response. All the work the model did dies with the connection.
If you try and build an Agentic app on a stateless HTTP backend, you run into all sorts of problems;
- How do you provide context to the model without asking the client to send it all with the prompt?
- How do you locate the process in your backend hosting the agent when the loadbalancer could route that request to any of your server replicas?
- Do you have to store and fetch all the context and conversation history in your database and query it each time?
- How do you fan-out the model response if you have multiple clients connected in the same chat?
- How do you share each participants prompts with all the other participants?
Building OpenAI group chats
Now imagine building a clone of the OpenAI group chats feature using HTTP post-and-wait? You end up with all the same problems, but as well as delivering responses on the HTTP connection that prompted the model, you also need to deliver those responses to everyone else in the chat.
Finding the connections to those other chat participants, and delivering them the responses is hard. And building it on a stateless HTTP backend design is going to push you towards getting your clients to poll for updates, and you querying the database to fulfil those polls. And that’s a big ol’ stink. It’s 2025 yo’ we should be realtime!
Realtime Pub/Sub platforms ace this problem space
If you’re not familiar with Pub/Sub channels, they separate messages into different topics or conversations. You can publish messages to a specific channel, and you can subscribe to messages on a specific channel.
A channel can have multiple publishers, and multiple subscribers. Publishers and subscribers could be either the AI agent, or a human user. If you’re trying to build OpenAI group chats, a single chat would be a single channel.
Each participant (both human, and LLM) would publish their messages to the channel, and each participant would be able to see all the other participants messages by subscribing to that channel.
But a lot more than just message delivery is handled for you:
Want all messages and responses to be automatically delivered to all the participants on the channel?
The Pub/Sub channel automatically does that for you.
Want late joining or disconnected clients have access to the conversation history?
The channel history does that. You can fill in all the parts of the conversation you missed, or resume from the message you last received.
Want the model or agent to have access to all the context of the group chat?**
You can use channel history for that too.
Want to know which participants are active in the chat?
Presence does that for you.
Want to replicate some of the examples in the OpenAI blog post; finding a restaurant everyone likes, designing a garden, or settling a debate?
You can store and collaborate on that state directly on the channel using LiveObjects.