How I Stopped Worrying and Learned to Love the WWW and UNIX Way

Lokad provides big data analytics for retail as a service. For a long time, in order to deliver this to our customers, we used to invent new technologies and frameworks to fit our "custom needs". More than 20 public github repositories are still out there, including custom Azure execution framework, ORM and message bus. This was a good journey with a lot of learning about distributed systems, event centric designs and big data processing. There were many challenges, too.

However, eventually we got tired of unnecessary challenges and became lazy ("we" as in "I"). This lead to one simple realisation: why do we even need to invent so much, if there already exists largest distributed system that we can learn from and reuse what others built? It is called World Wide Web. Surely, underlying principles might be not as sexy as some brand new "Enterprise stuff" (like AMQP or Azure Service Bus), yet they seem to work. Besides, WWW:

  • has a huge amount of documented experience (more than any "Enterprise" software);
  • wide variety of tooling;
  • is frustratingly simple.

In my current and limited experience, more we shift our design towards underlying principles of WWW (and away from latest sexy tech), more it feels like a huge relief and falling into the pit of success.

Current project that we are working on in Ufa office (rewrite of business and SaaS backend of Lokad) is nothing like the previous systems. It's composed of relatively small and stand-alone applications which communicate over simple protocols (JSON over HTTP) using constructs aligned with the established domain model. A lot of complex technology is gone.

For example, we ditched use of Azure Queues for communication between various components, replacing that with one-way RPC calls via JSON over HTTP (queueing can be plugged internally). All of a sudden, this:

  • reduces software complexity (e.g.: your backend server is only accepts one-way commands in JSON over HTTP and publishes events as JSON entities in ATOM feeds);
  • provides much better debugging and development experience (all of a sudden you can use tools like Fiddler or curl to interact and play with your backend server);
  • allows scaling writes without complicated message topologies (just queue up all your one-way HTTP PUT/POST/DELETE requests);
  • allows scaling reads by using dead simple force multipliers like reverse proxies;
  • actually allows to use more efficiently services provided by Windows Azure while reducing vendor lock-in.

In fact, in this project we stopped using all of Azure storage and messaging capabilities, since they are no longer needed. The only thing still used is Azure hosting model: instances of Worker and Web Roles which are managed by Azure fabric and run behind load balancers. However, should there be need it would be easy to move to a different cloud provider or managed hardware.

Here are a few other new cool possibilities that opened:

  • now it's possible to create integration tests for backend API by recording HTTP traffic with Fiddler and then replaying that;
  • backend API just became self documented and accessible via XML, JSON, SOAP, thanks to ServiceStack;
  • it's possible to rewrite some component using a completely new technology and nobody would ever notice;
  • design is suddenly more friendly for things like reactive programming and single-page applications;
  • it's much easier to deliver features by first implementing contract using hacky approach and then swapping that implementation for a proper code, whenever necessary.

Things work out in such a way, as if we were trying to steer closer to UNIX Philosophy:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

In Unix world, small programs can be composed together to perform more complex tasks, communicating via the pipe or text files. Here's a beautiful example of installing Ruby Version Manager from a terminal:

curl -L https://get.rvm.io | bash

In case of distributed systems, we can think of small and focused applications which communicate over simple and human-friendly protocol (e.g. one-way JSON over HTTP in a RESTful way). If you align messages of this protocol with the domain model, like they do in Domain-Driven Design methodology, you'll have something that can withstand change pretty well. Keeping these applications really small and focused (like UNIX programs) would allow to reduce cost and friction of changing implementations so much that they could be easily thrown away and rewritten from scratch. Add immutability to the list of underlying design principles of these components and suddenly you get nice and predictable scalability and fault tolerance.

All of a sudden, my head hurts less.

- by .