Latest Replies
Tuesday
Jul212009

CouchDB in the Cloud - Cheap and Flexible Persistence For .NET

This post about CouchDB is a research that continues topics of xLim and Cloud Computing in .NET series.

We’ll walk over one of the possible approaches for using CouchDB from .NET applications in a cloud scenario.

What is CouchDB?

CouchDB is a database engine designed for persisting documents without a predefined schema. It uses HTTP REST API for communicating with JSON-serialized messages. CouchDB claims to be designed for scalability (this includes multi-master replication model).

Simply put, you talk to CouchDB server with documents like this:

{
  "Subject":"I like Plankton",
  "Author":"Rusty",
  "PostedDate":"2006-08-15T17:30:12-04:00",
  "Tags":["plankton", "baseball", "decisions"],
  "Body":"I decided today that I don't like baseball. I like plankton."
}

Documents do not have predefined schema, so this is a valid one as well:

{
  "Author":"Rusty",
  "About": "Former baseball fan."
}

NB: although documents are unstructured, they always have a unique identifier associated with them and a revision number (CouchDB features Multi-Version-Concurrency-Control model).

In order to run queries against all this unstructured data, CouchDB features uses of View Servers that process and update structured views expressed in MapReduce queries (yes, that's the scalability opportunity here). These queries are like SQL in the relational world, yet they could be written in various languages (Javascript, Python, Ryby and Python are supported at the moment).

CouchDB as a project is a top-level open-source project within the Apache Foundation.

CouchDB is written in Erlang and is capable of running on POSIX systems. Theoretically it could even run on Windows, but the experience is far from being smooth. Yet, it is extremely easy to get yourself a cheap DB instance running in the cloud (as we'll see later in the article).

Primary disadvantages of the project are:

  • CouchDB is a young project and it has not gone through heavy stress usage;
  • .NET adapters are not present.

As always, using this technology in scenario where it does not fit well, will turn it into one big disadvantage as well.

How to run CouchDB in the Cloud?

At the moment there is no public CouchDB hosting available, yet. One future option worth of mentioning is a Couch.IO that will offer 10GB size databases for 30 USD per month.

However, let's see how we can get us a virtual CouchDB server at a fraction of this cost.

We'll use Rackspace cloud Virtual Machine slice in this scenario. You'll find the sequence for setting up a temporary development machine below.

Important: this configuration is not designed for production scenarios. You would at least need to adjust the firewall settings, change the ports, assign proper users and configure couchdb for auto-start.

  • Create new virtual cloud server:

    • Properties: 256 MB - $0.015 per hour - 10 GB
    • OS: Ubuntu 9.04 (jauntu)

Creating Cloud Server VM in Rackspace

  • Wait till confirmation email comes (a couple of minutes) and launch Putty (or any other SSH client) using the IP address from your confirmation email. Security warning will show up - Just hit "No" to accept the key temporarily. Username and password are provided in the email.

    Tip: right-click with mouse acts as "paste" in Putty. This works even when system expects password input.

    After a successful logon you should see something like:

Logon to remote cloud server

  • Update everything to the latest by typing following command and hitting enter:

    apt-get update

  • Install everything that is required for running latest CouchDB:

    aptitude -y install build-essential
    apt-get -y install libmozjs-dev libicu-dev libcurl4-openssl-dev erlang

  • Get the latest release and compile it:

    wget http://mirrors.24-7-solutions.net/pub/apache/couchdb/0.9.0/apache-couchdb-0.9.0.tar.gz
    tar zxvf apache-couchdb-0.9.0.tar.gz && cd apache-couchdb-0.9.0
    ./configure && make && sudo make install

    NB: You can install CouchDB 0.8 by simply executing apt-get couchdb. But it will miss some of the 0.9 features.

  • Allow the instance to listen to public IP:

    nano /usr/local/etc/couchdb/local.ini

    In this window you will need to change httpd section to look like the snippet below and save:

    [httpd]
    port = 5984
    bind_address = 0.0.0.0
    
  • Start CouchDB by typing:

    couchdb -b

  • Verify in the command prompt that the instance is running locally (it should return "hi" from CouchDB):

    curl http://localhost:5984

  • Verify from your local machine that CouchDB is accessible and ready by opening this Url in your browser :

    http://[IP of CouchDB Server here]:5984/_utils/

How to install CouchDB on Windows?

Working with a real and cheap server in the cloud might be interesting. Yet it could be impractical in certain scenarios (i.e.: when working in offline mode or testing complex deployment scenarios). For that we need to learn how to install CouchDB on a local machine.

Although it is possible to run CouchDB directly on Windows, the solution is too bulky and unreliable. We can simply use Ubuntu virtual machine instead, sticking to the routine above.

Unfortunately Windows 7 Virtual PC is extremely limited (its primary goal is to support legacy applications by virtualizing Windows XP). So in order to create a new VM we'll use Sun VirtualBox

The process looks like this:

  • Download latest Ubuntu server distribution.
  • Create new VM instance for Ubuntu, named "Ubuntu".
  • Set up NAT networking for this instance and create sufficiently large HD (2 GBs should be enough).

VirtualBox running Ubuntu Virtual Machine for CouchDB

  • Mount downloaded Ubuntu ISO into the virtual DVD and go through the bare server installation.
  • Logon into the instance and install SSH server:

    sudo apt-get install openssh-server

  • Shut down the instance and configure virtual SSH forwarding so that we could use Putty to talk to the OS locally (we are assuming that the VM instance name is 'Ubuntu'). These commands have to be executed from the VirtualBox directory. Note, that there are no line breaks between "Ubuntu" and "VBoxInternal..." (I've introduced them for readability).

    VBoxManage setextradata "Ubuntu" 
    "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guestssh/Protocol" TCP
    VBoxManage setextradata "Ubuntu"
    "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guestssh/GuestPort" 22
    VBoxManage setextradata "Ubuntu" 
    "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guestssh/HostPort" 2222
    

    This binding will redirect all calls from localhost:2222 to the virtual Ubuntu:22.

  • Configure CouchDB forwarding, so that we could talk to the database, as if it were installed on the localhost:

    VBoxManage setextradata "Ubuntu"
    "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guestcdb/Protocol" TCP
    VBoxManage setextradata "Ubuntu"
    "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guestcdb/GuestPort" 5984 
    VBoxManage setextradata "Ubuntu"
    "VBoxInternal/Devices/pcnet/0/LUN#0/Config/guestcdb/HostPort" 5984
    
  • Create short-cut to the GUI-less interface of VirtualBox. Launching it will start our server-in-a-box:

    VBoxHeadless.exe --startvm Ubuntu --vrdp=off

  • Connect Putty to localhost:2222 (port number we've configured for SSH forwarding) and perform the install routine from the previous section, starting from:

    apt-get update

  • Now you should be able to talk to your CouchDB (don't forget to start it) using the following address:

    http://localhost:5984/

    You can open Futon interface in the browser with:

    http://localhost:5984/_utils/

Using Futon to view CouchDB statuc locally

That should be it. Now we have a local CouchDB server that could be used as a development sandbox. We could also replicate this deployment in any cloud environment that supports Infrastructure as a Service scenario with VMs.

This database server does not have a predefined schema, is easy and cheap to deploy (once you know the drill) and has been designed with the scalability in mind. All this makes Couch DB an interesting technology that might fit well in xLim set of design principles.

In the next article we'll talk about using .NET to communicate with CouchDB in a strongly-typed manner. You can subscribe to this journal to stay tuned for any updates.

Related links:

« CouchDB in the Cloud - Persisting From .NET Code | Main | Source Code Reveals: Microsoft Plans Cloud Response to Amazon's Elastic MapReduce »

Reader Comments (7)

An excellent article, thanks for taking the time to research and write it!

July 21, 2009 | Unregistered CommenterJan

great article. I have a .NET adapter that uses JSON.NET if u want one.

July 21, 2009 | Unregistered Commenterpete w

@Pete, thanks for the offer.

After reading through JSON.NET codebase I felt it was too bloated. I ended up using JayRock and WCFJson serializers for the research. I'll talk more about that in the next article in the series.

July 21, 2009 | Registered CommenterRinat Abdullin

Small misspelling:

OS: Ubuntu 9.04 (jantu)

WBR, Oleg.

July 22, 2009 | Unregistered Commenterbrainunit

Thanks, Oleg!

July 22, 2009 | Registered CommenterRinat Abdullin

Nice article, I've been curious about hosting. I have a couch DB with 2.8 million records (HHS physician data) and it is very fast; apx 10x faster than oracle.

July 23, 2009 | Unregistered CommenterDerek

complex post. upright one unimportant where I quarrel with it. I am emailing you in detail.

August 8, 2009 | Unregistered CommenterDebt Settlement Program
Comments for this entry have been disabled. Additional comments may not be added to this entry at this time.