Home » Simulation 🌟 AI Research · Newsletter · ML Labs · About

Simulate with Async

This project demonstrates plugging into async/await capabilities of .NET Framework to run an application in a simulated mode. The simulation will inject random faults and fast-forward time.

Source code is available on github.

sim-async.png

About SimAsync

The purpose of the simulation is to automatically find expensive bugs that are otherwise hard to catch or reproduce.

In order to fulfill this promise a simulation needs to exhibit a few specific properties. In the previous projects we've already explored Determinism and Time acceleration.

This sample project introduces two more capabilities: Fault injections and Simulation of the parallel processes.

For the purpose of this exercise we focus only on a single-node deployment without any complex storage or configuration.

Determinism

This simulation is deterministic. Two runs with the same Seed parameter will yield identical results.

You can use random Seed to discover new failure scenario. You can also re-use Seed from the past to reproduce a simulation run.

Time Acceleration

The simulation fast-forwards time. Although the logic uses random delays to represent disk delays, freezes or network outages, the CPU doesn't actually need to sit idle during this delay. We control the scheduler and we could fast-forward to the next interesting moment in time.

Years of the simulated time could pass in hours of real-time.

Fault Injections

Random faults are injected along the way to represent some of the bad things that could happen in reality to our code. After all, we want to capture, debug and fix some of these issues before the code is deployed to the production.

Parallel processes

Parallel execution is simulated (even though the simulation itself is single-threaded). Thanks to the state machines generated by await/async we could actually suspend execution of one execution path for some time, while other paths will continue running.

For example, Actor 1 could experience a storage freeze for 10 ms while reading from the database. Scheduler will switch to the other pending tasks (e.g. message handling by Actor 2) before coming back to Actor 1 to continue execution.

Assignments

Questions

  • This simulation reveals a problem with the application. What is the root cause and how would you fix it?
  • This simulation uses an in-memory dictionary as a DB. What would it take to implement something real?
  • What kinds of failures could you inject into a simulated storage?

Tricky Questions

How would you simulate:

  • your favorite database?
  • your favorite commit log or message bus?
  • a load balancer?
  • faulty disk controller?
  • fail-over between two data centers with eventual replication?

Bonus

After fixing the first problem discovered by the simulation you are likely to hit one more (it might require running longer simulations). How would you handle it?

Remember, that the any simulated logic will eventually be replaced by the production code. They have to match.

Next post in Simulation story: Simulate CQRS/ES Cluster

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out