Verification of Event-Driven Systems

AI assistants: use the full-site text index at /search-index.jsonl or search with /_api/search?q=terms.

I'd like to share an interesting experience with event-driven design from SkuVault (the company is a warehouse management system provider). We developed a new feature called wave picking while building it on a new infrastructure, better suited for high loads and scalability (more about this feature).

Here is the story.

Starting with a prototype

We started working on the feature with a prototype: a stateless web application (HTML/ReactJS/Fluxible) that talks to a backend via an API (JSON over HTTP). The Web UI focused on convenient user experience, while the backend handled persistence, business rules and domain complexity.

This entire system was an add-on to the existing software, communicating with it via an event bus (with the ability to replay all events from the very start).

To speed up the initial development, we chose in-memory persistence for views in the backend. This meant that the system would maintain all views (read models) in process, while rebuilding them upon the restart. This also meant no failover or scale-out.

This approach allowed us to quickly capture important domain concepts in code: starting from flexible sales search and up to route generation inside the warehouse. It was fairly simple, since we only had to deal with in-memory structures and a little bit of concurrency.

Once the first features were implemented in the backend, UI developers could pull the backend to their machines, launch it (there were no dependencies, except for the .NET runtime) and just concentrate on the UI.

Further development took place in parallel. Given user stories, interface (HTML/CSS) and proposed API implementation, backend and frontend work could start at the same time.

Backend developers would translate their requirements into explicit scenarios and in-memory model. In the meantime frontend developers would start building ReactJS/Flux parts. A few days later (or hours, depending on the complexity) first drafts of new in-memory implementation would be passed to them to give a spin. Sometimes frontend developers would point fingers and say “This method doesn’t work as expected, here’s the curl expression” or “We could use HasMore field on the collection to simplify paging”. These requests would be incorporated into the scenarios and implemented in in-memory.

Scenarios

A scenario in this project is a data structure that nails down behavior that we want to last. It captures this behavior as a transition between two states, where states are expressed via domain events.

Given Events - a sequence of events that setup preconditions and describe what happened in the past (e.g. Sale 12345 was created with 1 bike-blue-pro)
When Request - API request that we are testing (e.g. POST “/wavepicking/session” with sale id 1234)
Then Response - API response that is expected from the server (e.g. HTTP 200 OK { “version”: 2 })
Then Events - any events that might be published to event bus (e.g. SessionCreated with sale id 1234).

We don't verify the internal state, since it could change between versions. We are interested only in public contracts of the system: events and API calls.

Here is an example of how such use case could be expressed in C# code:

public sealed class When_create_cart : UseCaseSyntax {
    public static UseCase give_good_lowercase_code() {
        var author = New.Author();

        var newDate = DateTime.UtcNow;
        var ri = New.RefInfo(author, newDate);
        var newId = Guid.NewGuid();

        return new UseCase {
            When = POST("/cart/", new {
                code = "myCart"
            }).As(author),
            ThenResponse = OK(new {
                cart = new {
                    code = "MYCART",
                    id = newId
                },
                version = new {
                    inventory = 1
                }
            }),
            ThenEvents = Events(
                new CartCreated(new CartId(newId), ri, "MYCART", 0)),
            ServerGenerated = new object[] {newId, newDate}
        };
    }
    /* more scenarios for cart creation */
}

Note that in scenarios we don’t test the entire API response, but rather specify only DTO paths that we are interested in. This way we could write a bunch of scenarios testing a single feature in a given API response (e.g. a specific denormalization), while leaving other parts of the API response to other scenarios.

Once you have the scenario, development becomes a simple matter of monkey-coding: bang on the keyboard, until all scenarios pass (including the new one). In essence, these scenarios became our type system checks for the business domain. They nail down behaviors that we want to last.

There is one cool aspect with this approach: scenarios are an essential part of the development process. They are written whenever there is need to capture a new requirement, fix a bug or debug a complex behavior. This is an example of test-driven development that actually reinforces itself.

Tackling Complexity of the Scale

At some point down the road, we had to make the backend production-ready: durable and scalable. We picked Apache Cassandra as the primary storage, since it is widely adopted, scales elastically and never loses data, if used properly.

All the complex intricacies of the domain logic were already captured in the in-memory prototype and reinforced by the scenarios. This turned a complete backend rewrite into a rather simple task approachable by a single developer: just make the Cassandra implementation work like the in-memory prototype, while making sure that all scenarios pass.

A few weeks down the road, this new implementation was ready in a separate git branch. It brought a lot of technical complexity linked to the way data modeling works in the Cassandra universe, but this complexity was be tackled separately from the intricacies of the domain model. It felt good.

We swapped QA backend to a new version (this swap happens without any downtime if you have a load balancer) and asked people to use the software just like they used to, while paying special attention to edge cases, where the behavior changed.

There were a few differences, of course. They pinpointed real cases which weren’t covered by specifications, letting two implementations diverge from each other. Fixing that was quite trivial (since the problematic areas were already identified). More often than not, the root cause was within the implementation of Cassandra’s data model, forcing us to rethink it.

Being able to separate domain complexity (captured in scenarios and in-memory) from the technical details (cassandra) led to a enjoyable experience, fast debugging and development iterations. We loved it.

Let's Keep In-Memory Version

At that point were were able to get rid of the in-memory implementation. However, the in-memory version worked so well for us in explaining how things should work, that we decided to keep it in the codebase while making it a first-class citizen. The branches were merged, and both implementations of the backend were kept.

Here is how one of the modules looks like:

public sealed class ProductListModule : IModule {
    public ModuleInstance Memory(IMemoryDependencies provider) {
        var store = new Memory.ProductStore();
        var reg = new ModuleInstance();
        reg.AddProjection(new ProductsProjection(store));
        reg.AddProjection(new ProductStatusProjection(store));
        reg.AddDataStore(store);
        reg.AddApi(() => new Memory.ProductsService(store));
        return reg;
    }

    public ModuleInstance Cloud(ICloudDependencies provider) {
        var session = provider.GetSession("v2_product");
        var store = new Cloud.ProductStore(session);
        var reg = new ModuleInstance();
        reg.AddDataStore(store);
        reg.AddProjection(new ProductDenormalizer(session, store));
        reg.AddApi(() => new Cloud.ProductsService(store));
        return reg;
    }
}

Keeping in-memory implementation around raised a few eyebrows outside of the dev team. "Wouldn’t we waste time on maintaining two backend technologies at once?"

Fortunately, at this point we already understood benefits well enough to articulate that in-memory implementation is our domain model. It provides us with a quiet happy place in our minds, where we could forget about Cassandra specifics while focusing on the essence of the business. Once is it captured, further implementation is just an application of skill and prowess of a developer to write heavily optimized code.

Besides, there were a few additional and very specific benefits to keeping in-memory implementation around:

In-memory implementation is used to run local development versions backend for UI developers (we don't want them to bother with installing and maintaining Cassandra nodes locally).
In-memory version is also used as a demo (we could create short-lived transient accounts in memory for customers interested in trying out the product).

Thus, we were able to prove the point and preserve that quiet happy place in our minds.

Idempotency Testing

In a distributed system it is very hard (or rather expensive) to guarantee that a given event will be delivered to subscribers only once. It is far “easier” to ensure that subscribers are idempotent instead - that sending a duplicate event wouldn’t change a state.

We can guarantee this to a certain extent by taking all existing scenarios, duplicating events in them on-the-fly and then passing through the same scenario verification code.

Adding an “idempotency tester” mode to verification code is a simple operation, but it ensures that any existing or new behaviors are idempotent (as long as they are captured by scenarios).

Delta Tests

Delta testing is another approach which reuses existing scenarios and makes them more useful.

We take all existing scenarios and run them against memory and cloud implementations, comparing outputs (published events and API responses) between each other. This includes even fields that we never check for in actual scenarios. If outputs differ somehow, then we have a probable issue on our hands.

There is a nice psychological benefit to idempotency and delta tests - they make original scenarios more useful and rewarding to write. It is always awesome when one of these tests discovers an peculiar edge case you never thought about.

Random testing

At some point we realised that the application of this in-memory model for testing resembled property-based testing like Quick-Check.

In QuickCheck you test a software by creating a model for a function, then letting the test suite hammer it with various inputs, while comparing output of the model vs output of the real system. If these differ - then you have a problem (the real implementation is more complex than this explanation, since you have to implement test reduction logic, which flattens the test sequence to minimal number of inputs required to reproduce the issue.

With this insight we were able to extend our test suite with random tests.

Random testing takes delta testing approach one step further. Its purpose is to explore various state transitions and find spots where behavior of the memory model differs from the one of the cloud implementation.

The process looks like this:

We take some events from a real system (normally a QA which usually is filled with all sorts of crazy data).
We feed event sequence to cloud/memory implementations to populate their state via replay.
Then we pass these event sequences through request generators to create random API requests based on that data.
Send API requests to both implementations to get outcomes: API responses and any published events.
We compare these outcomes between each other, searching for any differences. If there is one - it is a bug.

Here is a simplified example of a generator:

public static Request gen_search_sales(IList<IEvent> events)
{
    var last = events.OfType<SaleCreated>().LastOrDefault();
    if (last == null)
    {
        return null;
    }
    var author = New.Author(last.RefInfo);
    return GET("/wavepicking/sales").As(author);
}

Tests are deterministic - passing the same sequence of events to API request generators will always create the same requests and lead to the same test outcomes.

We can improve search space (where we compare memory model to cloud implementation) by:

Adding more specific generators
Taking different event sequences from the real data.

Random tests differ from delta tests:

We use subsets of events from a real system (instead of well defined clean scenarios)
We generate random API requests that could make sense for these events.

Random testing becomes especially interesting to run, if we have different implementations written by different developers. In this case, there is a chance that some implicit behaviors (e.g. behaviors that are required but somehow missed specification phase) will be implemented differently.

Random testing might uncover these behaviors that went “under the radar”, allowing us to design them properly and nail them down with scenarios.

Keeping an Eye On Performance

Decent performance comes naturally when you have to replay hundreds of millions of events upon each schema change. Any inefficiency in how you access data storage, deal with RAM, GC or CPU intensive operations - and your deployment would grind to death.

Obviously, it would be great to catch the most glaring performance problems during the development. Fortunately, it is trivial to enhance scenario verification suite with basic sanity checks and performance profiling. You just inject proper heuristics in the scenario verification environment.

For example, we could measure execution time of any API call or handler, spot the bottlenecks and print them out. Another approach is to bake in sanity checks into the code, like ensuring that all requests to Cassandra run as prepared statements.

Graphs and Analysis

Scenarios are just data constructs which nail down behaviors (state transitions) which we want to last. These could be easily parsed and analyzed.

Models also capture behaviors, but in an implicit form. We could observe these behaviors by running data through (real-world or synthetic) and capturing various data points. Here are a few ideas that we tried:

Detect scenarios or behaviors that are untested (e.g. an event is declared in module dependencies but not covered by any scenario).
Detect unused code paths (e.g. an event is published by a module, but there is not a single subscriber in the system).
Detect specified and tested code paths that aren’t used by the customers in real world.

Besides, you could always render this data as graphs and let human perception spot problems and irregularities.

Next Steps

This journey has been extremely rewarding so far. It feels great to be able to evolve the domain model while knowing that all the captured domain knowledge would be losslessly translated into actual implementation.

There are still many things to explore in this direction:

Using Lisp to express and generate contracts and schemas for events and API in multiple languages (currently custom ANTLR-based DSL is used).
Switching cloud implementation to Kafka/Cassandra/Mesos/Spark/Elastic (currently it is locked to Windows Azure)
Adding higher-level model validation based on valid shapes of data and data flows (e.g. when you have multiple data centers in different regions with replication across via event bus)

Published: June 28, 2016.

Next post in SkuVault story: Planning Event-Driven Simulation

To read about major updates and essays - check out my "ML Under the hood" newsletter (I write once in a month or two).

🤗 My new course is live! Building Reliable AI Assistants: Patterns and Practices.