Aggregates: one piece of code design

Oct 10, 2022 789 words 4 minutes

Most tips for code design exist to keep a lid on the exploding complexity of our systems.

Aggregates

I won’t explain all the details of an Aggregate here, but I will say this…

An Aggregate is:

a concept from Domain Driven Design (DDD).
a cluster of associated objects that we treat as a unit for the purposes of data changes.
a mechanism for simplifying the inter-object relationships and complexity.
a container that enforces the invariants on the data it holds.
globally identified only by its root.
opaque, all actions must go through the root of the Aggregate.

Aggregates are hierarchical

Aggregates are an important mechanism to tame the complexity inside a system by simplifying and limiting the traversal and number of relationships between objects.

Aggregates are inherently hierarchical. There’s a ROOT, this root is the global identity of the Aggregate. The Aggregate can hold relationships to other objects. The other objects may be simple value-objects or may be other entities, or other Aggregates.

If other objects want to hold a reference to the Aggregate, that reference must be to the Aggregate’s root.

Aggregates have a root, and hold other Aggregates. You can only access an Aggregate via its root. This makes Aggregates hierarchical.

Our design patterns and tools are hierarchical

Many of the tools and design patterns that we use are hierarchical.

A good example of this is a REST API. A typical SaaS REST API could be design like this:

/workspace/{workspace_id}/members/{member_id}/documents/{document_id}/comments

Our REST API expresses the hierarchical design of our data.

workspaces contain members
members have documents
documents have comments

This API is for comments, but it leverages the hierarchy of our data to simplify the actions and operations.

Each of the entities in the API are likely Aggregates, and each holds other Aggregates. Our ROOT Aggregate is workspace. We know that if our workspace is deleted then we should delete all the entities that are contained in that workspace.

Aggregate boundary litmus test

Modelling the DELETE is a good litmus-test for the correct Aggregate boundaries. If you delete the ROOT do you you must delete everything inside the Aggregate. And if you want to delete data outside the Aggregate as well – should that data actually be inside the boundary of the Aggregate?

We could flatten our API into:

/comments?document_id={document_id}

This might seem more convenient for the comments API, but by doing so we loose the hierarchy of the data, and the context within which the comment exists. With this design for our comments API, we’d be tempted to follow a similar design for workspaces, members and documents.

/workspaces/{id}
/members?workspace_id={workspace_id}
/documents?member_id={member_id}

We’re pretending that all our entities (workspaces, members, documents) are of similar value/importance. We’re pretending they should all live at the root of our hierarchy.

As we start to add requirements, we find that every entity needs to hold references to most other entities above it in the hierarchy.

action	requirement
List `documents` for `workspace`	the `document` must know the parent `workspace`
List `members` for `workspace`	the `member` must know the parent `workspace`
List `documents` for `member`	the `document` must know the parent `member`
List `comments` for `document`	the `comment` must know the parent `document`
List `comments` for `member`	the `comment` must know the parent `member`

So.. instead of a nice hierarchical data structure of workspace -> members -> documents -> comments we have:

comments (workspace, member, document)
documents (workspace, members)
workspaces

By flattening our API, we’ve fully inverted the data hierarchy, making all the lower entities know about the upper ones. I wrote about this exact problem in the post: Data modelling in SaaS apps.

Now, when we revisit our example of “delete the workspace”, we find that enforcing the data invariants is much harder. We have to visit every individual entity and delete it given a workspace. And what happens if the workspace is deleted? Who or how are the members, documents, and comments deleted?

By flattening our API we’re also tempted into splitting these entities across different ‘micro’ services. This splitting of data into entity-CRUD services makes enforcing the data invariants, and propagating the operations like delete, much harder.

Designing good Aggregates is high leverage

Leverage is getting a large effect with a small amount of effort. If you’re going to spend a small amount of time/effort; please spend your time designing Aggregates.

Many features, issues, problems you’ll encounter later become easier when you have good Aggregate boundaries, and hierarchical data design.

We’ll find that to use our tools and design patterns well – like REST APIs – we have to design and enforce good Aggregates and boundaries. Successful design compounds to make our jobs easier, and bad design compounds to make our jobs harder. The ‘root’ of good design lies in Aggregates.