Windows Azure, SQLite and Temp
If you are using Cloud Computing environment, be prepared for certain constraints to show up on your way. They can be quite subtle. Here's one.
I'm using SQLite + ProtoBuf + GZip for persisting large datasets in Azure Blobs. This combination (even without GZip) can handle 10-50GB datasets in a single file without much overhead and with a decent flexibility. Of course, there are certain constraints (mostly related to classical scalability issues), however for the command-side of CQRS non of these apply. Resulting data processing throughput is massively superior to SQL Azure or Azure Table Storage, while costs are negligible.
I really enjoy this combination of technologies. One of the advantages is that everything is open source and I don't need to bother about writing B-tree indexes or de-fragmentation routines for my data structures. However, there was a problem spoiling all the fun. It showed up frequently in the production environment, while I was trying to VACUUM SQLite database (essentially cleaning up and defragmenting everything):
Database or disk is full
This one was driving me a bit crazy, since more than 500GB of disk storage were allocated as Local Resources to the worker roles. As it turns out, this was caused by the interplay of the following things:
- SQLite uses temporary files to perform certain operations (VACUUM is one of them).
- SQLite, being cross-platform in nature, relies on the OS to provide temp files location.
- On Windows Azure both TEMP and TMP point to a single directory that has a maximum size of 100 MB.
Such directory structure makes sense since (just my wild guess based on what I know about Hyper-V and how I would've implemented Windows Azure VMs in the first place), since we want to have fixed VM image (with a footprint shared between multiple instances) followed up by a small differential VHD and a larger local resource VHD, should customer request it.
Solution is to define Local Resource and redirect environment variables there:
var tempPath = RoleEnvironment.GetLocalResource("Temp").RootPath;
Environment.SetEnvironmentVariable("TEMP", tempPath);
Environment.SetEnvironmentVariable("TMP", tempPath);
The other option for SQLite would've been to use pragma:
PRAGMA temp_store_directory = 'directory-name';
Monday, November 29, 2010 at 17:30
Reader Comments (4)
I am curious as to how you go about working with these SQLite files. When the time comes to use them do you just lock the file and then download it, do the work and then upload it again?
Caleb, it's actually simpler. There is no need to lock, since in system (CQRS architecture) each aggregate root is accessed by only one process at a time (no concurrent access).
Plus, if there were no changes, the file is not uploaded back.
So how do you ensure that only one process is working on the data at a given time? If I have multiple workers processing the queue then how would enforce it? Or even if I have a queue per aggregate root how do enforce that only one worker is working on that queue at a given time without locking something?
Caleb, just have no more than 1 worker per queue in your system.
Of course you can have many queues per worker