Home © Rinat Abdullin 🌟 AI Research · Newsletter · ML Labs · About

Move Forward by Discarding Complex Tech

Good things are either well-forgotten past or a complete rip-off from the nature. It seems that at Lokad we are going all the way back in time ourselves as well.

Over the course of the last few days we had really interesting times at Ufa office, while migrating entire event replication infrastructure to a new model. If you wish, you can call this infrastructure as bounded context of digital nervous system that is represented by green arrows in our context maps. This is a really interesting place for us, since it "touches" multiple other bounded contexts and actually crosses 2 clouds and 1 additional datacenter deployment-wise. Change shocks are mesmerizing to observe.

Now, instead of a mixture of Azure queue delivery and ZeroMQ streaming, our applications just push large event streams over hand-made HTTP replication protocol. This effectively uses HttpListener and WebRequests, which are:

  • rather performant;
  • dead-simple and well understood;
  • have minimal friction of introducing replication to new projects (ZeroMQ is pretty invasive here, if you go for Azure);
  • can be debugged with a lot of HTTP-based tools.

The design is rather simple, practical and works well for streams of half a million of events (albeit performance could be improved a lot). This was really important, since we have now a number of bounded contexts to integrate together and the volume of event streams just keeps on growing.

It is curious, how our movement forward towards better and simpler designs happens concurrently with stepping back from complex technologies to much simpler ones. In other words, we gain by discarding things.

Another example of such behavior is related to our recent decision to discard ProtoBuf as the storage format for large data objects, while replacing ProtoBuf+Gzip with TSV+Gzip. This applies specifically to bounded contexts that deal with big data. Reasons for that being:

  • ProtoBuf by default loads all objects directly into the memory at once (imagine a dataset of 1 GB), while the default behavior of text files is streaming;
  • For numerical data TSV+Gzip compresses better than ProtoBuf+Gzip, since archivers were initially designed and optimized specifically for handling text data;
  • You can read and parse TSV dataset with tools on any platform, including scripts and Excel. While with protobuf, some intermediate dancing would be required.

So, if I can reduce a number of technologies in a given bounded context, while making it more practical and performant, then that's a clear choice.

As you can see, in certain scenarios, we are stepping back from cool and smart tech towards something more practical and simple. This "stepping back" actually enables us to solve certain problems that exist in this specific scenario. Surprisingly enough, this brings us closer to the Unix philosophy:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

I certainly didn't expect to see this happening before, not even theoretically. However, in practice, there is a big difference between theory and practice.

Caveats

Please, keep in mind, that:

  • we are aware of ProtoBuf capability to read items sequentially.
  • we still will be using ProtoBuf for serializing messages, including events that are used for our event sourcing scenarios (leveraging for .NET development a wonderful library by Marc Gravell)
  • these examples just serve the purpose of illustrating possibility of cases, where you can move forward by discarding a technology. Specific decisions might not be applicable directly to your case.

Published: May 05, 2012.

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out