Messages and Distributed Systems

AI assistants: use the full-site text index at /search-index.jsonl or search with /_api/search?q=terms.

Messages are essential to building robust and distributed systems, so let’s talk a bit about them.

Message is a named data structure, which can be sent from one distributed component to another. Components can be located on the same machine or on different sides of the Earth.

The basic real-world example of message is an email. It has a sender, subject and one or more recipients. This email might take some time to reach the designation. Then it gets to the inbox, where it could spend even more time, before recipient finally gets time to read it and may be even reply.

Messages, just like emails, might take some time to reach recipient (it could be really fast but it is not instantaneous), and they could spend some time in the message queues (analogue of inbox), before the receiving side finally manages to get to work on this message.

The primary disadvantage of messages and emails is their asynchronous nature. When we send email or message, we expect the answer some time later, but we can never be sure that we will get it right away. Direct phone calls (or direct method calls) are much better here – once you get the person on the phone, you can talk to him in real time and get results back almost immediately. Despite all these disadvantages, messages could be better than calls for building distributed and scalable systems.

With phone calls and method calls you:

Can get response immediately, once your call is picked up.
Must be calling, while the other side is available (greater distance you have, harder it is to negotiate the call).
More stressed the other side is, more time it will take before your call will be picked up. And this does not guarantee, that you will get the answer (when the other side is really stressed you are likely to get: we are busy right now, call next millennia).

With messages you:

Can send a message and then get back to your work immediately.
Must organize your work in such a way, that you will not just sit idle waiting for the response.
Can send a message any time, the other side will receive and respond, as soon as it gets to the job.
More stressed the other side is, more time it takes to receive the answer. No matter what the level of stress is, the other side will still be processing messages at its own pace without any real stress.

Since we are mostly interested in building distributed and scalable systems (which can handle stress and delays) messages are a better fit for us, than the direct method calls in the majority of the cases. They allow decoupling systems and evenly distributing the load. Besides, it is easy to do with messages such things like: replaying failing messages, saving them for audit purposes, redirecting or balancing between multiple recipients.

Note, that there are cases, where direct calls work better than messaging. For example, querying in-memory cache does not make sense with messaging. Cache is fast and you want to have the response immediately.

For an overview of how messaging works together with a scalable distributed system, check out Decide-Act-Report model.

Published: June 06, 2011.

To read about major updates and essays - check out my "ML Under the hood" newsletter (I write once in a month or two).

🤗 My new course is live! Building Reliable AI Assistants: Patterns and Practices.