On Choosing Dynamo

Introduction

The last few years have been a tumultuous journey for databases, and it has been interesting to follow the trends and attempt to make a decision for our new product being rolled out at Quark Games. For a long time, MySQL or PostgreSQL were de facto choices for a new web application or CMS. This changed in the valley however, when demands for even faster iterative cycles pushed developers into schemaless development, and ultimately into seriously evaluating NoSQL, not just for columnar tables intended for map reduce, but as their primary data store.

To meet the demand, more and more distributed and non-distributed NoSQL storage solutions have cropped up. Nearly all of them make the same promises of redundancy, effortless scaling, and flexible schema. Of course, these promises do not come for free, and many developers found themselves in the uncomfortable position of wishing that they were using MySQL again.

So where are we now? What database should we choose, and under what circumstances?

CP systems

RDBMS’s and NoSQL stores like HBase, Redis, RethinkDB, and Couchbase are CP databases that emphasize immediate consistency and will compromise availability in the case of node failure or a netsplit. This includes sharded configurations of MySQL or PostgreSQL. In these sorts of database systems, writes and reads are typically serialized through a single authoritative source through some sort of hashing algorithm. Losing a node will generally make a subset of keys unavailable for some time until failover is done to replace the node, or the leader election algorithm is performed on the lost keys to redistribute the load.

CP systems generally make more sense if your data is accessed from many places simultaenously (for a single key). Under these circumstances, immediate consistency allows for atomic operations either with check-and-set functionality, or a locking mechanism. Redis, for example, provides SETEX, WATCH, MULTI, and a number of other commands to coordinate data changes. This is important for counters, shared lists, and other data that is expected to be read and written to from a variety of sources.

AP systems

AP systems typically fall under the category of database based on the Dynamo paper. These databases replicate data from node to node and allow reads and writes to any of the nodes that house the particular key in question. As such, they provide eventual consistency instead of immediate consistency and will continue to allow reads and writes even in the presence of node failure. Examples of AP databases are Riak, Cassandra, Voldemort, and DynamoDB.

AP systems generally make more sense if each chunk of your data is usually accessed from one place at a time. It is unlikely in this case that eventual consistency will be as significant, and the application can afford to do read repair or serialize writes from a single source. AP systems are obviously preferable if availability is important! For some applications, you’d rather display something rather than nothing at all.

Comparing CP and AP

CP systems will generally require less hardware to do the same number of operations because replication happens asynchronously. However, latency tails for CP systems can be expected to be higher because operations happen against a single node. In AP systems, since multiple nodes have a chance to respond to a request (like a race), latency tails are reduced significantly.

Handoff procedures in the face of node failure are also cheaper with AP systems whereas with CP systems, latency and throughput may suffer more. This is due to the additional coordination required to promote replica copies. With AP systems, the system behaves more or less as it did before, although latency may increase for a subset of keys. The cluster as a whole though, will be less affected.

Choosing

Our data is structured as so. Player X can read or write data for Player X. Player Y can read data from Player X but can’t write to it. As a result, data is almost entirely written from a single source (the application is stateful). As such, there are only several instances in the codebase where we need to account for eventual consistency and perform a simple repair opration. Generally, I believe that if you can go with an AP system, you should do so (but benchmark responsibly first). Availability and uptime don’t feel important until downtime occurs, and consistency can be managed. Lately, I have really enjoyed working with Riak, which has performed well in my benchmarks and seems to make the right design tradeoffs.

Comments