/dev/knill

All your agents are going async

Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.

For most of the time LLMs have been around, you use them by opening a chat-style window and typing a prompt. The LLM streams the response back token-by-token. It’s how ChatGPT, claude.ai, and Claude Code work. It’s also how the demos work for basically every AI SDK or AI Library. It’s easy to think that LLM chatbots are the ‘art of the possible’ for AI right now. But that’s not the case.

Instead, all your agents are going async. Agents are getting crons, webhook support, whatsapp integrations, ‘remote control’ from your phone, scheduled tasks and routines. Agents are becoming something that runs in the background, working while you work, and reporting back results async. Agents are getting workflows in Temporal, Vercel WDK, Relay.app, etc. A human sitting at a terminal or webchat is just one mode now, and increasingly it’s not the interesting one. The interesting thing is what agents can do while not being synchronously supervised by a human.

The problem is that chatbots are primarily built on HTTP. An HTTP request with the prompt, and a SSE stream of LLM generated tokens back on the HTTP response. But this doesn’t work when the agent is running async. There’s no HTTP connection to stream the response back.

OpenClaw’s async step

OpenClaw took a big step towards async agents, by showing people that an agent could live in your WhatsApp chat. The agent could travel around with you, and could work on stuff in the background. OpenClaw showed that you didn’t have to be glued to your browser or terminal to get AI to do work for you.

Anthropic’s direct response to the OpenClaw model is Channels, which is MCP based and allows you to push messages async from an external chat system into a Claude Code session. But they also have /loop and /schedule slash commands, as well as Routines, both allowing you to schedule and run agents in the background. Anthropic also has Remote Control, which lets you continue a Claude Code session from your phone or another browser.

ChatGPT has scheduled tasks which trigger agents async, that can reach out to you if needed.

Cursor has background agents that run in the background in the cloud.

All of these features are about breaking the coupling between a human sitting at a terminal or chat window and interacting turn-by-turn with the agent. They make interactions with agents continuous, remote, long-running, and async.

The transport mismatch

All these new async features share the same property; the lifetime of an agent’s work is decoupled from the lifetime of a single HTTP connection. In chatbot demo apps, the agent is only processing while the HTTP connection is open. The LLM is doing inference in response to an HTTP request and streaming the tokens back on the HTTP response as an SSE stream. I’ve said before that a chatbot’s worst enemy is page refresh, and this is entirely because of the transport mismatch. HTTP request-response can’t survive a page refresh, and it can’t serve async agents.

There are four scenarios that the old transport based on HTTP can’t handle cleanly:

  1. Agent outlives the caller: A routine fires from a cron, or the agent takes a long time to do its work. Five minutes later the agent has a result, but no one’s listening anymore. Where do the results go? Right now, they go in a database and you have to poll for them with some session URL (which, y’know, sucks).
  2. Agent wants to push unprompted: The agent finished a nightly backlog review and has three PRs for you to review. Or your async workflow hits a human-approval step and needs you to say yes before it can keep going. There’s no connection back to you. Right now, they email you or send a slack message.
  3. Caller changes: You started a task at your desk, went to lunch, and want to check on it from your phone. Anthropic’s Remote Control handles this, but only by building custom backend session storage and management. It’s not a first-class feature of the HTTP transport.
  4. Multiple humans in one session: You have a team of five people working on a task together, and an agent is helping you. The agent needs to be able to push updates to all five of you, and take input from any of you.

Part of the reason that folks found OpenClaw so awesome is that it handles all of these scenarios for you. OpenClaw’s model separates the lifetime of the agent’s work from the lifetime of the connection to the human. The agent can do work async, and then use WhatsApp, iMessage, Telegram, Discord or whatever async chat system to push the results back to you when it’s done.

This just isn’t possible with HTTP request-response.

So how are folks solving this?

Looking across the industry, there are a bunch of different solutions to this. Clearly there’s the OpenClaw model where all the interaction is through some external chat provider. This chat provider also provides the conversation history to the agent, so the agent can have context on the conversation even after restarts. But this is just an extension on the chat-based model. I don’t think it’s the most interesting solution.

Most folks are pulling more and more of the session state into a centralised and hosted environment. Anthropic is doing this with Routines and Remote Control. More of the session state, conversation history, and agent inference is running in the hosted Anthropic platform. They are consolidating more of the agent lifecycle and agent connection state into their own platform, rather than just being an LLM inference API.

Cloudflare are getting involved too with their own Agents platform built on their workers platform. The Cloudflare Sessions API provides the session and conversation storage for agents, accessible over HTTP. And to fix the async notification problem, Cloudflare has launched their Email for Agents product.

These solutions solve only one half

The problem actually splits into two halves. The first half is durable state. Where does the agent’s state live, how does the agent have access to that state on restart or when processing async tasks, and where does the agent store its output? The second half is durable transport. How do the bytes of the response get between the agent and the humans or other agents, how does the connection survive disconnect, device switch, fan-out, server-initiated push, etc?

The Anthropic and Cloudflare solutions are really focused on the first half of the problem. They are building durable state storage and management for agents. Their solution to getting the bytes of the response is still polling, or HTTP requests. Cloudflare does have websocket support, but it doesn’t survive disconnections for streaming LLM responses. Anthropic and Cloudflare’s solution is based on the idea that if they store all the data required, then clients can always HTTP GET that data later. It half works, but it’s not ‘art of the possible’.

Durable transport, durable state

Right now, the session and the transport are all wrapped up in a single HTTP request-response. Cloudflare and Anthropic’s hosted features go some way to making the session state durable, but the don’t fix the transport problem. You’re still stuck with HTTP gets, or polling, in order to find out something new has happened.

A ‘session’ with an AI should be a thing that humans and agents can connect to, and disconnect from at any time. They should be able to come and go, the session should survive wifi issues, or your phone disconnecting when you go into a train tunnel without the human or the agent needing to care about it. The conversation state should be accessible through the durable session, and humans and agents should be able to notify each other through the session. The session should be a first class primitive for building async agents.

I work for Ably and we are currently building a durable transport for AI agents built around the idea of a session. We’re building on top of our existing realtime messaging platform. It’s frustrating to see folks fighting the similar problems over and over again, all because they picked the wrong transport to start with: HTTP request-response.

Because we’re building on our existing realtime messaging platform, we’re approaching the same problem that Cloudflare and Anthropic are approaching, but we’ve already got a bi-directional, durable, realtime messaging transport, which already supports multi-device and multi-user. We’re building session state and conversation history onto that existing platform to solve both halves of the problem; durable transport and durable state.