Software Design Blog

Journey of Rinat Abdullin

How Lokad Uses Blogging to Master Large-Scale Cloud Computing

Blogging is a part of thinking and learning process. It also helps to structure knowledge, share and collaborate upon it. When blogging is a part of company’s lifestyle, it essentially turns this company into a small focused university.

Here’s a quick sample of how we approach this in Lokad. This is one of the internal work log posts I’ve quickly put together this morning, while analyzing and tuning large-scale data processing in Windows Azure. This information will be available for the entire team to benefit from, potentially helping them in making decisions in different projects and pushing the company experience forward.

In DDD terms, we are talking about a single Aggregate here; cloud systems can have millions of such aggregates being handled in parallel.

OK, here’s something that we’ve learned about indexes and large BLOB files:

  • File of 2GB can easily be compressed to 300Mb, when it uses SQLite and ProtoBuf for fast and effiecient storage.
  • Random access with writes within such file (7M records) can be extremely fast.
  • Compression with MD5 hashing adds an overhead. We essentially invest CPU to reduce network and storage, while ensuring consistency. It’s worth it.

Here is the snippet from the stats (low-level execution stats as reported by SQLite wrapper, they include protobuf serialization overheads as well):

/* RemoteIndex_DownloadMs: a minute */
/* RemoteIndex_ExecActionCount: 5 */
/* RemoteIndex_ExecMs: 30 seconds */
/* RemoteIndex_UploadMs: 3 minutes */]

A few explanations:

  • DownloadMs - time taken to download, decompress and verify MD5 Hash
  • ExecActionCount - number of queries executed against the index (I’m batching large requests in a transaction to avoid consuming too much memory)
  • ExecMs - total time taken to execute these queries
  • UploadMs - time taken to hash, compress and upload the entire index file (we upload if we modify index).

Actual solution file is:

  • 315 MB in storage/transfer
  • 2,3 GB in uncompressed mode

And the amount of series stored in that file:

SQLite used with a cloud computing system

Extremely important lesson learned: some tools and libraries can make you life a lot easier, when they are mature and have rich ecosystem around. SQLite is such an example.

Again, this blog posts talks just about a single aggregate. Lokad.CQRS and Lokad.Cloud (just like any message-based system in the cloud) can easily scale out to handle thousands and millions aggregates like this. That’s one of the essential “features” of cloud computing.

Execution Stats mentioned in this post are merely performance counters (StopWatch) that are included into the domain events and are available to all consumers of Domain Log (including TimeMachine querying). This massively helps to see performance in real-time and understand what’s going under the hood of your cloud system.

This post could also give you the idea of the scale of data that can be handled by Lokad Forecasting Services with the help of Windows Azure and cloud computing. Active blogging and R&D just help to do this slightly more efficiently and without wasting a lot of resources.