So you want to build Miro and Figma style collaboration?

Oct 5, 2023 1231 words 6 minutes

Contents

Miro and Figma have a bunch of collaboration features, in this post I’m going to break down two of those features and look at what you’d have to think about when building these into your own apps.

Disclaimer: I work for a company in this product space, which is why I care about these problems.

Lets start with..

Collaborative cursors

Collaborative cursors allow multiple users to interact on the same page of a website, and for each participant to see where the other participants are pointing or moving their cursors.

Each cursor can have a location on the screen, modelled as an X and Y coordinate. The frontend can render the cursors on each of the participants screen, and move those cursors as the coordinate locations change.

So the main problem becomes:

How does each participant get access to the coordinates of the other cursors?

Via the database?

In a regular client-server application with a database, the way to share information between clients would be to save it in the database. Our servers are probably horizontally scaled, and store all their state in the database. A client would be load-balanced and connect to any of the server replicas.

All the servers can access the cursor information stored in the database, and share it with the clients that are connected to that server.

We could have a table called cursors with an entry for each user, and the location of that users cursor.

user	location (x,y coords)
Erik	0,5
Kate	3,20
Bea	2,7
Jamie	15,9

So we have our cursor locations in the database, but we still have two big problems:

How do we get these cursor locations from the database to the clients?
We have to write values to the database each time a cursor moves. This increases our database load.

So, how do we get the locations from the database to the clients? This is hard because in most apps use an HTTP request/response model for communication between the client and the server. This means that the server doesn’t have contact with the client until the client makes a request to the server. So the server has no way to notify the clients that the cursor locations are changing without the client asking for them.

Using an HTTP request/response model, our clients have to poll (or long poll) the server (and by extension the database) for the updated cursor positions. This clients will always be polling, even if the cursor locations haven’t changed, because the clients don’t know that the cursor locations have or haven’t changed without asking.

Problem 2: Load on the database

We’re adding write load to the database everytime the cursor locations change, and read load for the database every time the client polls. And for the clients to be most up to date, they want to poll regularly. The more regularly the clients poll, the more database load we generate.

And on top of it all, the data we are sharing between the clients is ephemeral. As soon as the client leaves, or moves the cursor, the old data becomes useless.

Websockets

To fix the problem of polling, we can use websockets. By connecting the server and client with a websocket, the server can send data to the client without the client having requested it. We can use the same websocket connection for the client to send the cursor updates to the server – double win!

When the cursors change, the server can push those cursor updates to the clients. This gives us half a plan for solving the clients polling the server for updates.

So why only half a plan? Well remember that our server is probably a horizontally scaled stateless application running a number of replicas. Our websocket connection exists between one client and one of the server replicas.

We find that we still need to store the locations of the cursors in the database, so they can be shared between the server replicas. Each of the server replicas will need to know when one of the replicas has updated a cursor position, so that the server can push the new cursor positions to websocket connections hosted by that server

For the server replicas to know that some row in the database has changed, we either need some kind of database trigger for the database to tell the server that the data has changed, or the servers will need to poll the database for changes. We’ve not really improved the polling problem. Instead of the clients polling the server and database, we’ve moved the polling down the stack to the server polling the database.

The cursor data is ephemeral

I touched on this before, but ultimately the cursor data is ephemeral. It’s not really application state that we want to keep persisted in our database. We just want the cursor data to be shared among clients.

This is where a client side pub/sub product helps a lot. We can push the cursor data through a pub/sub channel to the other clients without needing to route it through our database.

Active collaborators

Another feature of Miro and Figma style collaboration is being able to see who the other active collaborators are. This is super similar to cursor location, but is often shown as a little avatar rather than a cursor and a location. The avatars are shown when that user is active in the document, and the avatar disappears when the user goes away.

We want to share this information between the clients too, and all the same problems that we had with cursor data and sharing through the database apply here too. But with the active collaborators stack, there’s an extra complication about entering and leaving the page that users are collaborating on. We want to be able to detect that someone has gone away. The user might have closed the browser tab, so we can’t rely on receiving some data that indicates a ’leave’ event.

The naive solution is for the clients to heartbeat, and for some component to compute the recency of heartbeats, and calculate which clients are active or inactive.

So which component does this calculation? We’ve got another choice between two bad options. Either we have some server component storing the recency data in a database and computing which clients are active or inactive (which sucks because we have the same problems as cursors, where we have database read/write load for ephemeral data) or the heartbeats are sent over a pub/sub channels product and every client calculates the recency of the other clients based on the heartbeats that specific client has received (which sucks because it’s an n^2, n-squared problem, every client has to compute for every other client).

We either have to do the same work on all the clients, or we have to centralise the recency data so that some centralised component can calculate the active state.

Collaboration is hard to build well

Collaboration features are hard to build well in a scalable way, which is why there’s a long tail of startups offering products in the realtime and collaboration space. It’s nice to see products being offered that makes collaboration features easier to implement in apps.

If you think collaboration is interesting and want to see the products we are building at Ably, check out Spaces