Space Travel and Infinitely Scalable Solutions

Recently, I’ve been re-reading Pat Helland’s paper on infinitely scalable solutions in the same time interval with finishing A second chance at Eden follow-up to Night’s Dawn trilogy by Peter F. Hamilton. Something clicked in my mind and Sci-Fi concepts of space travel got strongly locked with the scalable architectures. It became much simpler to understand cloud enterprise development and challenges of infinitely scalable solutions.

I strongly recommend at least glancing through Helland’s paper (just 10 pages of extremely useful information and thoughts) before proceeding with this journal post. And if you are a big fan of Science Fiction, then Night’s Dawn is another recommendation (thanks, Joannes!).

Basically the concept is simple: while building our scalable systems we treat aggregate roots as space ships (ARs are similar to entities in Helland’s paper) .

In the context of the repository pattern: aggregate roots are the only objects your client code loads from the repository.
In Domain-Driven Design: the root is the only member of the AGGREGATE that outside objects are allowed to hold references to.

There are a few rules to start with:

Human race always starts with a Solar system and begins expanding outwards.
Universe might have multiple star systems.
Space ship is obviously smaller than a star system and always fits in there.
As human race expands across the galaxy, it builds more and more ships.
Although a single ship can always fit and be sustained by star system (even if the ship is as big as Death Star), the entire human fleet might not fit or be sustained.

Now same rules apply to the aggregates or entities as well:

Scalable application usually starts with a single machine.
Cloud fabric or data center might have multiple machines available for the app.
Aggregate root (entity) can always fit into a single machine (or a small cluster).
As application grows it gets to handle more and more aggregates; they are re-distributed towards the new machines as needed.
Although a single AR fits on the machine (or a small cluster), entire application might not (hence the need to expand).

So far - so good. Let’s explore the universe.

Ships are small and relatively safe. When you need to talk to the crew member - you call. It takes milliseconds to for the connection to be made.
Space is a large and unpredictable place. While calling from the ship to the ship, you never know how far your recipient is going to be. It can be a few light seconds away, a few minutes away or it could have traveled to the other side of the galaxy.
It is usually possible to send a message from ship to ship via the hyper-space relays, but you never know when you’ll get the response. Message might even need to chase the ship for a little bit.
Since space is a large and unpredictable space, hyper-space relays need to be redundant, sending the message via a few routes. This guarantee that it will eventually get through. Although a ship might get a few copies, this is not a big deal, since it’s trivial to look up the correspondence with the sender and discard the duplicates.
There are always exceptions. When it is really needed and one has credits at hand, it is possible to buy quantum entanglement channel for the ships. It will be fast and reliable but extremely expensive. Besides it locks ships together (it’s hard to have a reasonable talk when one ships is in the normal space and the other accelerates towards the speed of light)

Let’s see how these future principles apply to the modern world of distributed systems:

Aggregate is a native consistency boundary. Since it fits into the memory of a single machine, you can always ensure that everything is rolled into the transaction.
Events could be propagated within the Aggregate Root instantly and reliably. Each event that goes outside - will take an unknown amount of time.
Usually queues do not guarantee that the message will be delivered only once and in the order (although they try their best at it). That’s the common to Azure Queues, for example. It is the responsibility of the recipient entity to track and handle duplicates, restoring proper ordering where this is important. Activities (sagas), managing entity-2-entity partnerships, are usually responsible for such operations.
There are always exceptions. If it is really needed and one has development resources at hand, some sort of direct messaging could be established (i.e.: based on the TCP abstractions), but it is rather unusual and expensive. Same is with the transactions - although they are usually better to stay within the AR, it is possible to bend the rules, if it is worth the effort and increased complexity.
We usually can't be sure about the the exact delivery time and state of it's recipient, until the reply or some notification come in (which might take some time). During this interval of uncertainty state of the recipient is uncertain to us. We need to consider and design on this fact.

Lessons learned for me: build your space ships small and ready for the travel around the known space as it expands. In other words:

Infrastructure should be capable of evolving in order to handle scaling and repartitioning.
Business logic and entities should avoid doing anything that anchors them to each other or a specific partition. Otherwise, when time comes to move and scale, it will hurt.
Message-based architectures allow building scalable and decoupled architectures. Yet they bring a degree of uncertainty and eventual consistency into the solution. We need to start learning from the real world and our past. Our ancestors were building long-running interactions, transactions and vast organizations with sail mail, telegraph and various analogues of Pony Express. Message delivery was indeed slow and unreliable back then, taking months instead of fractions of the second.

This article builds upon and continues xLim 4 (CQRS in the Cloud) research and development series, attributing to the CQRS body of knowledge being gathered within this Journal.

Published: August 22, 2010.

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out