You know how it is that we do what we do

On Choosing Dynamo

| Comments

Introduction

The last few years have been a tumultuous journey for databases, and it has been interesting to follow the trends and attempt to make a decision for our new product being rolled out at Quark Games. For a long time, MySQL or PostgreSQL were de facto choices for a new web application or CMS. This changed in the valley however, when demands for even faster iterative cycles pushed developers into schemaless development, and ultimately into seriously evaluating NoSQL, not just for columnar tables intended for map reduce, but as their primary data store.

To meet the demand, more and more distributed and non-distributed NoSQL storage solutions have cropped up. Nearly all of them make the same promises of redundancy, effortless scaling, and flexible schema. Of course, these promises do not come for free, and many developers found themselves in the uncomfortable position of wishing that they were using MySQL again.

So where are we now? What database should we choose, and under what circumstances?

CP systems

RDBMS’s and NoSQL stores like HBase, Redis, RethinkDB, and Couchbase are CP databases that emphasize immediate consistency and will compromise availability in the case of node failure or a netsplit. This includes sharded configurations of MySQL or PostgreSQL. In these sorts of database systems, writes and reads are typically serialized through a single authoritative source through some sort of hashing algorithm. Losing a node will generally make a subset of keys unavailable for some time until failover is done to replace the node, or the leader election algorithm is performed on the lost keys to redistribute the load.

CP systems generally make more sense if your data is accessed from many places simultaenously (for a single key). Under these circumstances, immediate consistency allows for atomic operations either with check-and-set functionality, or a locking mechanism. Redis, for example, provides SETEX, WATCH, MULTI, and a number of other commands to coordinate data changes. This is important for counters, shared lists, and other data that is expected to be read and written to from a variety of sources.

AP systems

AP systems typically fall under the category of database based on the Dynamo paper. These databases replicate data from node to node and allow reads and writes to any of the nodes that house the particular key in question. As such, they provide eventual consistency instead of immediate consistency and will continue to allow reads and writes even in the presence of node failure. Examples of AP databases are Riak, Cassandra, Voldemort, and DynamoDB.

AP systems generally make more sense if each chunk of your data is usually accessed from one place at a time. It is unlikely in this case that eventual consistency will be as significant, and the application can afford to do read repair or serialize writes from a single source. AP systems are obviously preferable if availability is important! For some applications, you’d rather display something rather than nothing at all.

Comparing CP and AP

CP systems will generally require less hardware to do the same number of operations because replication happens asynchronously. However, latency tails for CP systems can be expected to be higher because operations happen against a single node. In AP systems, since multiple nodes have a chance to respond to a request (like a race), latency tails are reduced significantly.

Handoff procedures in the face of node failure are also cheaper with AP systems whereas with CP systems, latency and throughput may suffer more. This is due to the additional coordination required to promote replica copies. With AP systems, the system behaves more or less as it did before, although latency may increase for a subset of keys. The cluster as a whole though, will be less affected.

Choosing

Our data is structured as so. Player X can read or write data for Player X. Player Y can read data from Player X but can’t write to it. As a result, data is almost entirely written from a single source (the application is stateful). As such, there are only several instances in the codebase where we need to account for eventual consistency and perform a simple repair opration. Generally, I believe that if you can go with an AP system, you should do so (but benchmark responsibly first). Availability and uptime don’t feel important until downtime occurs, and consistency can be managed. Lately, I have really enjoyed working with Riak, which has performed well in my benchmarks and seems to make the right design tradeoffs.

Get on the Listserv

| Comments

I often get asked how I stay up to date with the “latest and greatest” best practices, libraries, technologies, and developments in the programming world. The simplest advice I can give, is that if you are serious about learning to code, or learning technology, or staying relevant even, “get on the listserv.”

Using websockets? Get on the hybi listserv.

Pyhon? Ruby? Erlang? C? Get on the listserv.

Use vim? Emacs? A particular mode in emacs? A particular vim plugin? Get on the listserv.

LaTeX? D3? R user? Again, listserv.

Heck, if you find yourself doing hardcore kernel tuning (I was in this position last week), add yourself to the linux kernel listserv too.

If you don’t want a clogged inbox, it being the 21st century, you can address this by simply making a filter. Personally, I have everything go straight to the inbox and I read at least the subject lines of every email that comes in (this needs to be timeboxed of course, gmail shortcuts are your friend). I spend relatively little time on email compared to some of my peers because I batch my reads (like I/O buffering), but over the last few years, I’ve found that when attempting to learn something, there is no substitute to being on the listserv for that something. Here’s why:

  1. You have an immediate conduit for your questions (assuming you ask good questions the correct way, meaning you’ve exhausted other options).
    • Note that while many questions are answered by google or stackoverflow, if you stay with a technology for long enough, it is extremely probably that you will come across an original question (or at least a question with no online presence).
  2. You can see what’s upcoming
  3. Because you are aware of #2, you actually have a hope of contributing, for example, if you have a need that is clearly not being addressed anywhere
  4. You can answer other people’s questions! This is your proof so to speak that you actually understand what it is that you are learning. Nothing drives home understanding like teaching.
  5. You can connect with some of the best minds in the community. Iron sharpens iron, and usually, they won’t bite (iron also tends to pierce wool, but egos are useless in any hard science anyways).

Your Startup Is Uninteresting

| Comments

There’s a reason I’m doing what I’m doing now (making games). The projects that I work on require truly interdisciplinary thinking on large scales. Gamers are found in pretty much every microcosm of our population and making something “fun” is very nuanced and difficult. Of equal importance is the technical challenge associated with shipping a game (especially if it’s real-time, 3d, etc).

So when people pitch their “startups” to me, I can’t help but think, “I would never work for that startup.” Here is what makes a startup uninteresting to me:

  1. There are no hard problems
    • So many startups I see on hackernews or in my linkedin mail is just another REST application.
    • Boring boring boring. At least try to make a new kind of database, or programaming language, or mode of transportation, or anything that is not something some kids could have done in high school (if they were high schoolers, I’d be more impressed of course).
  2. The people leading the project are not technically sound
    • This can be smelled miles away. “Expertise in Ruby or NoSQL preferred.”
    • If I had a dollar every time a person who wanted to use a NoSQL store actually needed a NoSQL store, I’d be a millionaire.
  3. The business aspect is not sound either
    • This relates somewhat to #1. There is nothing in the product or service proposal that strikes me as something groundbreaking.
    • Making money is secondary to solving problems to me. The first follows the second.

Every now and then, I see something more interesting, but for the most part, “entrepreneurs” have their mind in the “WHAT REST BASED WEB APP CAN I SPIN UP IN RAILS THAT HASN’T BEEN DONE BEFORE” gutter. They find their idea, and immediately incorporate themselves as a C-Corp because that’s what everybody else does (even though their little web app is unlikely to ever need more than 100 shareholders).

So then, what makes a startup more interesting?

  1. Challenging problems
    • This is easy to fake. Even an easy problem can be “challenging” if you’re restricted to doing it in 40 seconds or less.
    • If I can’t quite come up with the solution the the problem in the first 48 hours, and even then, only have an idea afterwards, it is somewhat challenging
  2. Interesting problems
    • If I can’t stop thinking about the problem, then it is an interesting problem as well.
    • This is more subjective than #1 but some thing are obviously uninteresting as well.
  3. The problems solve a demand
    • I don’t need to see weeks and weeks of market research
    • The “measure the demand first” principle that so many startups employ is all well and good, but I think it derives from how most of their “products” is a simple webapp that wouldn’t have much demand unless it was a fluke.
    • Give me a highly performant distributed database that supports multi-object transactions. Boom. Demand guaranteed. Or an extremely addictive and fun video game. Again, demand guaranteed.
  4. The problems solve a demand I have
    • Even if the first 3 points are there, if I personally am not interested in the result, then I will never be able to summon the passion required to see it through to the end

In short, startups are startups because they do things that haven’t been done before. As to the reason why it hasn’t been done before, it could be that the problem was not worth solving or that the problem was too hard to solve or that the problem has never occurred to anybody before. Startup founders need to realize that problems that fall into the last category are the rarest and subject to the most risk (they are easy to replicate). Problems that are in the last category could be interesting. Problems that meet a demand in the second category are interesting.

Writing an Erlang WebSocket Client

| Comments

I’ve recently authored a simple websocket client for Erlang here. The motivation was that all existing clients either didn’t support ssl, or support RFC 6455, the latest websocket RFC draft specification. The following outlined my goals for the project:

  1. The usage should resemble a standard OTP behaviour
  2. It should be efficient
  3. It should be minimal

Make your libraries resemble OTP behaviours

Following the principle of least astonishment, providing an interface to your code that resembles something familiar is an easy way to make code less error prone and more usable. This also forces the application author to read more into how OTP behaviours are actually implemented with nothing more than receive blocks, tail recursion, and ! sending patterns.

The core of the websocket_client loop looks like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
%% @doc Main loop
-spec websocket_loop(State :: tuple(), HandlerState :: any(),
                     Buffer :: binary()) ->
    ok.
websocket_loop(State = #state{handler = Handler, remaining = Remaining,
                              socket = Socket, transport = Transport},
               HandlerState, Buffer) ->
    receive
        {cast, Frame} ->
            ok = Transport:send(Socket, encode_frame(Frame)),
            websocket_loop(State, HandlerState, Buffer);
        {_Closed, Socket} ->
            Handler:websocket_terminate({close, 0, <<>>}, HandlerState);
        {_TransportType, Socket, Data} ->
            case Remaining of undefined ->
                retrieve_frame(State, HandlerState,
                               << Buffer/binary, Data/binary >>);
                _ ->
                    retrieve_frame(State, HandlerState,
                                   State#state.opcode,
                                   Remaining, Data, Buffer)
            end;
        Msg ->
            HandlerResponse = Handler:websocket_info(Msg, HandlerState),
            handle_response(State, HandlerResponse, Buffer)
    end,
    ok.

This might seem confusing at first glance but let’s break it down.

1
2
3
-spec websocket_loop(State :: tuple(), HandlerState :: any(),
                     Buffer :: binary()) ->
    ok.

This is the type specification, indicating that the function takes a state tuple, the handler state (which could be anything), and the existing existing data in the buffer. Of course, this loop must be tail recursive, so all data needed to run the loop must be contained somewhere in these arguments.

Next, the code jumps immediately into the receive clause. Let’s go through each case:

1
2
3
    {cast, Frame} ->
        ok = Transport:send(Socket, encode_frame(Frame)),
        websocket_loop(State, HandlerState, Buffer);

Here, if the loop receives a message that looks like {cast, Frame}, it will encode the frame, send it via the transport (which is either ssl or gen_tcp), and then reenter the loop again with no state changes.

1
2
    {_Closed, Socket} ->
        Handler:websocket_terminate({close, 0, <<>>}, HandlerState);

All other 2-arity tuples the loop receives with the socket as the second filed must be a close message emitted by the socket itself. In this case, the handler’s websocket_terminate callback is called to allow the user to do any cleanup necessary. The loop is not reentered afterwards and the process is allowed to finish.

1
2
3
4
5
6
7
8
    {_TransportType, Socket, Data} ->
        case Remaining of undefined ->
            retrieve_frame(State, HandlerState,
                           << Buffer/binary, Data/binary >>);
            _ ->
                retrieve_frame(State, HandlerState, State#state.opcode,
                               Remaining, Data, Buffer)
        end;

This message is received whenever the socket receives data since the sockets are in {active, true} mode (having the sockets in {active, false} mode means that the recv function needs to be called explicitly to receive any data on the socket). The function then checks if we were previously waiting for data to finish an existing frame, or if this is the start of an entirely new frame. The appropriate retrieve_frame is called depending.

Note here that I do not explicitly continue the loop but allow retrieve_frame to decide if it wants to continue the loop or not. This was done to make the loop clean as many errors can occur upon websocket data retrieval that may force the client to shutdown according to the websocket standard.

1
2
3
    Msg ->
        HandlerResponse = Handler:websocket_info(Msg, HandlerState),
        handle_response(State, HandlerResponse, Buffer)

Finally, all other messages are sent to the handler and the response is used to determine if the loop should send any data to the server before restarting again.

The library is now in a usable state, meaning that it can:

  1. Issue a handshake on either tcp or ssl
  2. Receive a handshake response
  3. Verify the response’s correctness
  4. Accept data and send data
  5. Interact as intended with the provided handler on start_link
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-module(sample_ws_handler).

-behaviour(websocket_client_handler).

-export([
         start_link/0,
         init/1,
         websocket_handle/2,
         websocket_info/2,
         websocket_terminate/2
        ]).

start_link() ->
    crypto:start(),
    ssl:start(),
    websocket_client:start_link(?MODULE, wss, "echo.websocket.org", 443, "/", []).

init([]) ->
    websocket_client:cast(self(), {text, <<"message 1">>}),
    {ok, 2}.

websocket_handle({text, Msg}, 5) ->
    io:format("Received msg ~p~n", [Msg]),
    {close, <<>>, 10};
websocket_handle({text, Msg}, State) ->
    io:format("Received msg ~p~n", [Msg]),
    timer:sleep(1000),
    BinInt = list_to_binary(integer_to_list(State)),
    {reply, {text, <<"hello, this is message #", BinInt/binary >>}, State + 1}.

websocket_terminate({close, Code, Payload}, State) ->
    ok.

The above handler connects to the echo server over wss and prints the following output with a 1 second interval between each message:

1
2
3
4
message 1
hello, this is message #2
hello, this is message #3
hello, this is message #4

Note how it reads and feels like a typical gen_server. Neat huh? There is still a lot of work to be done on the client in the form of testing, error handling, and RFC compliance. Contributions accepted!

Lessons From 2012

| Comments

2012 was a great year for learning for me. I had the opportunity to make many mistakes and learn from them and for that I am grateful. So without further ado:

Lesson 1: Automate everything

I could have written this as “test everything” but “automate everything” encompasses that and so much more. As an example, if you find yourself performing the same editor actions over and over, STOP. Learn how to write a macro and store it for easy reuse.

If you find yourself manually executing a series of steps to check a result, STOP. Write an integration test now and learn how to run your integration tests in isolation for faster development. I would say that the moment something manually needs to be done in this regard once or twice, a test should be written.

If your deploy process is “nearly automated” by whatever tools you are doing, STOP. Author your own tool that fills in whatever gaps your existing tools have.

If your compilation requires some sort of dependency checking or other manual steps that do something other than make all, edit your Makefile and make it better.

On that note, if you are forgetting to build the documentation and commit it, add that to the Makefile too as make docs.

On that note again, you might as well add make docs as a pre-commit hook so you never have to think about it again.

Why do all this? Because it is worth it to invest in your own time. By taking some time upfront to learn something new and automate it, you are investing in yourself, and surely that is worth it right?

Things you should probably learn if you haven’t already (and you program):

  • Bash/Zsh scripting
  • Makefiles
  • emacs/vim (Actually learn them)
  • The profiler of whatever language you use
  • Debugging tools of whatever language you use

Lesson 2: Take actual breaks

I need to get better at this myself. Personally, I hate leaving any problems outstanding, but inspiration rarely strikes when you are in the thick of things. My most productive moments come after I’ve been itching to get back to work.

But, but, but, I’m a workaholic.

Yes, you are (speaking to myself here). But even in moments when you aren’t working on your direct tasks, you can at least take time to read or work on something entirely different. To me, this still constitutes a “break” and I have enjoyed exposure to many topics this year as a result including:

  1. A new programming language
  2. A reread of K&R
  3. Electronics
  4. Hadoop (which, for reasons I’ll discuss in a later post maybe, I don’t plan on using for a while)
  5. Contributing to various open source libraries

Also, breaks should encompass exercising (mandatory!) and eating. I missed more meals than I care to admit in 2012, and I hope this will improve moving forward.

Lesson 3: Don’t do what everybody else is doing just because

This comes with a caveat. Usually, lots of people are doing something because that thing fits a lot of use cases. Or, the general use case of the time is essentially the same from person to person. How many people, when told to write a webapp, turn straight to Ruby on Rails without even thinking?

Use the right tool for the right job. Make your life harder for a short while, but much easier later. It can be scary to venture off the beaten track (beaten either by others or by yourself), but at the very least, keep an open mind to all options available and analyze them objectively.

Some people have the opposite problem where they go out of their way to do something novel just for the sake of novelty. This is dangerous as well. Again, strive for objectivity, determine the best course of action, and do it.

Lesson 4: Your morale matters

If you don’t like what you’re doing, if you find yourself bored, do something about it. Tell your manager (and be grateful that you have a manager that will listen). Before doing so, however, make sure that there is, in fact, another option that you can demonstrate capability for.

Be appreciative of the people that improve your morale. Humor and positive attitudes are probably better indicators of long term productivity at a professional workplace than raw “smarts.” Note that I said “long term” productivity. Having smarts but zero morale or an extremely negative attitude may work for a while, but burning out is a real danger.

When you choose work late, be thankful that you have a job that actually provides you with tasks that you would be interested in working late on. Don’t impose a hard schedule on yourself and hate it too! If you don’t like what you’re doing that much, than don’t work evenings. This goes for students too. If you are miserable studying condensed matter physics or Riemannian manifolds for an entire weekend, maybe you should be a Physics or Math major (or politics or English or …).

Lesson 5: Your editor matters

Don’t “settle.” Examine other editors and ruthlessly assimilate everything that could be useful to you. I won’t speak too much about this, but be brave! If you don’t use emacs or vim (or one in the other) yet, I encourage you to give it a shot in 2013. If you already use them, take time to continually invest in your knowledge of them. Learn a new package or write your own.

Conclusion

It is very possible that you the reader disagree with some of my assessments, and that’s ok. I should stress that (this being a personal blog), all of the above is directed primarily at myself. Hopefully, however, some of it is useful to you as well; I am sharing this because that possibility exists.

Setting Up an Erlang Cluster on EC2

| Comments

Generating a working release to deploy to EC2 required a fair bit of trial and error. This post assumes that you have a set of working OTP applications that you would like to deploy on multiple machines. Let’s call these applications example_app1 and example_app2.

Project Structure

I like to organize my files and folders as follows:

1
2
3
4
5
6
7
8
|- apps
|  |- example_app1
|  |  |- ...
|  |- example_app2
|  |  |- ...
|- rel
|  |- ...
|- rebar.config

All applications go in the apps folder. Release specific files, configuration, and the bundled release itself go in the rel folder. The rebar.config at the top level looks like the following:

1
2
3
4
5
6
7
8
{deps_dir, ["deps"]}.
{deps, []}.
{sub_dirs, [
            "apps/example_app1",
            "apps/example_app2",
            "rel"
           ]
          }.

With this config file, we can run rebar compile at the root level of our project and it will automatically compile every application.

Generating the release

The release will bundle together your applications along with any dependencies necessary to run them on a similar system. It will also bundle in the Erlang RunTime System (ERTS) so that the release executables can be run without Erlang installed on the target machine. Make sure, however, that the build machine and the target machine are of similar (if not identical) architectures.

First, you’ll want to use rebar to generate a set of files, among which is the all important reltool.config file. You can do this by running in the rel directory, rebar create-node nodeid=example, substituting nodeid for your project name or whatever you want. Then modify it to look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{sys, [
       {lib_dirs, ["../deps/", "../apps"]},
       {erts, [{mod_cond, derived}, {app_file, strip}]},
       {app_file, strip},
       {rel, "example, "1",
        [
         kernel,
         stdlib,
         example_app1,
         example_app2
        ]},
       {rel, "start_clean", "",
        [
         kernel,
         stdlib
        ]},
       {boot_rel, "example"},
       {profile, embedded},
       {incl_cond, derived},
       {mod_cond, derived},
       {excl_archive_filters, [".*"]}, %% Do not archive built libs
       {excl_sys_filters, ["^bin/.*", "^erts.*/bin/(dialyzer|typer)",
                           "^erts.*/(doc|info|include|lib|man|src)"]},
       {excl_app_filters, ["\.gitignore"]},
       {app, hipe, [{incl_cond, exclude}]},
       {app, example_app1, [{mod_cond, app}, {incl_cond, include}]},
       {app, example_app2, [{mod_cond, app}, {incl_cond, include}]}
      ]}.

{target_dir, "example"}.

{overlay, [
           {mkdir, "log"},
           {copy, "files/erl", "\{\{erts_vsn\}\}/bin/erl"},
           {copy, "files/nodetool", "\{\{erts_vsn\}\}/bin/nodetool"},
           {copy, "files/example, "bin/example},
           {copy, "files/example.cmd", "bin/example.cmd"},
           {copy, "files/start_erl.cmd", "bin/start_erl.cmd"},
           {copy, "files/install_upgrade.escript", "bin/install_upgrade.escript"},
           {copy, "files/sys.config", "releases/\{\{rel_vsn\}\}/sys.config"},
           {copy, "files/vm.args", "releases/\{\{rel_vsn\}\}/vm.args"}
          ]}.

As you can see, we’ve added a line that looks like

1
{app, example_app1, [{mod_cond, app}, {incl_cond, include}]}

for every application we want included and we’ve also included the app name in the sys –> rel option. These applications will all be executed on start.

Configure App Config

By default the node name is set to example@127.0.0.1 but we probably want to strip the attached ip address. To change this edit the -name flag passed in rel/files/vm.args.

1
-name example

Any application specific configuration should happen in rel/files/sys.config. This file also gets copied into the release bundle (you can see all the file and directory manipulation in the overlay option of the reltool.config file). For now, make the sys.config file look like

1
2
3
4
5
6
[
 {kernel, [
           {inet_dist_listen_min, 9100},
           {inet_dist_listen_max, 9200}
          ]}
]

These two kernel options will restrict the port range for inter-node communication. This is important if you plan on deploying on instances that will be part of a security group or behind a firewall. You can put other application configuration variables in this file as well.

Generate the release and deploy.

Now, generate the release by running

1
rebar compile generate

Note that you may need to run rebar get-deps first if your applications have any dependencies. This will bundle your applications, ERTS, dependencies, configuration files, and arguments all into rel/example. Nifty!

You can test it out by running ./rel/example/bin/example start and a subsequent ps aux | grep example should show all the processes now running attached to your project.

To deploy to EC2, you need to make sure that all instances in the cluster have the following ports open:

1
2
9100-9200
4369

Port 4369 is used by the Erlang Port Mapper Daemon (epmd). Finally, bundle up your release and run it on all your instances. To connect your instances:

1
net_adm:ping('example@*insert ip here*').

You should get a pong as a response if you did everything correctly.

Further down the road

To get this example configuration working, many things were done manually. Steps that can (and should) be automated are the generation, distribution, and deployment of each release. In addition, we can automate adding nodes to a cluster by including a ping as part of our bundled application or by making a custom boot script (or by running a separate script after the deploy).

I struggled a lot to get a working configuration as the use of rebar and reltool is not obvious (at least not to me). Hope this helps somebody.

Sharded_eredis

| Comments

A while ago, antirez wrote a great post about Redis presharding. I haven’t read it until recently, but it seems like the ideal way to go about distributed systems. You should definitely read this before continuing to read this post. First, let’s talk about why people bother with distributed systems at all.

Why do I (or you) want a distributed database?

Many people think it’s a no brainer, but if you haven’t thought it out, I encourage you to think hard about the how and why. The most common reasons are:

  1. My throughput is too high
  2. I want to scale effortlessly (just hit a button)

My throughput is too high!

When you say this, do you mean that your throughput per key is high, or that your throughput per machine is high? In the former case, yes, you should probably get a distributed database and replicate your keys to increase read/write availability (at the cost of strong consistency).

Most of us are in the latter camp though. Our throughput exceeds our infrastructure. This can be solved very very well with sharded non-distributed databases!

I want to scale effortlessly

This is a legitimate concern. As developers, nobody wants to deal with the headache of moving data around, ensuring that no data is lost, and that migrations run smoothly without corruption. So they do what comes to mind first: throw a distributed database at the problem. Distributed databases do in fact solve this problem, but perhaps at a higher cost than people realize.

The “issues” with distributed databases

These aren’t issues so much as concerns that the developer must address when migrating to a distributed data model. The developer must realize that transactions are gone, and so, consistency is harder to enforce. Distributed databases that assign a master node to each key (RethinkDB, Couchbase) have slightly easier consistency problems but sacrifice some availablity, especially in the presence of node failure.

To top it off, one of The worst problems that a maintainer of a distributed database needs to deal with is the actual scaling part. Each addition or subtraction requires a rebalancing operation of some sort and sacrifices availability during the change.

Given that not all applications require strong consistency and transactions, many developers might be ok with this. However …

There is another way

Presharding. By appropriating all the shards in advance, one never needs to worry about transferring data between nodes or rebalancing. By using built in replication (handled well by most non-distributed databases), shards can be moved to larger machines or split off (if they started on the same machine) with zero downtime or degraded performance.

The project sharded_eredis does this to some effect using eredis process pools. Given a Redis command, it will automatically find the pool name associated with the correct shard and pass the command along to the correct node.

By tacking on administrative tools to spin up many redis instances, setup replication, and failover, we can leverage Erlang’s code swapping to point existing db pools to the correct locations seamlessly.

Why Ubuntu

| Comments

I’m generally wary of posting controversial opinions online. Backlash can be orders of magnitudes worse than expected, and dissension is not possible, but probable. Nevertheless, I will be describing what went through my head when I installed Ubuntu desktop recently for the first time since high school (some eight odd years ago or so).

Opaque installation

I decided to install Ubuntu to gain some additional familiarity with the package manager. After all, I planned on running a cluster of tens of Ubuntu servers in the cloud. In addition, Ubuntu introduced me to the world of Linux in a fairly accessible manner. I remain grateful to Ubuntu for this to this day.

The first thing I noticed was that the installation seemed professional and “neat.” However, beyond letting me set up the partitions as I would have liked (which is something even Windows lets the user do), I had no way to know what was being installed, let alone change it. “Fine,” I thought. This is just Canonical’s way of making the operating system more usable for the average user, and while I might have preferred a more transparent installation process, I was ok with it as it was.

The Travesty

Amazon Ads, Amazon Ads Everywhere

Admittedly, I’ve had my head under a rock when it comes to Ubuntu news. Needless to say, my jaw dropped when I saw the Amazon icon on the left toolbar. I removed it immediately and attempted to launch a terminal from the Ubuntu launcher.

Are those ADS under the launcher search result?

I clicked one of the links. Sure enough the browser opened to an Amazon product page.

Why this is sad

This is sad for a number of reasons.

  1. Ubuntu is no longer free. I had generally thought of Ubuntu before as the flagship Linux distro for the average consumer. It was to be the “canonical” example of a distro that would appeal to the masses, proving that FOSS principles work. However, with the inclusion of ads, Ubuntu is no longer FOSS.
  2. This means that Canonical’s business model could not sustain itself. I can only imagine that this move was their last card. But was it necessary even then? In traditional economics, a lack of demand for something doesn’t mean you “raise the price” of that thing (which is effectively what has been done here). It means that perhaps a downsize was necessary, or a refocusing to services that actually make sense. As a server engineer, I am interested in cluster management and monitoring for example. I would gladly pay for those services. Couldn’t they have attempted to provide that? If they are, advertise it to me!
  3. Ubuntu desktop misrepresents the Linux world and the FOSS world as a whole. Being a prominent choice among a plethora of distros, it is sad that Ubuntu cannot uphold FOSS ideals (this is probably a whole series of blog posts).

I have since gone back to my trusty old Arch install. I’m not sure that I will use that platform to run my server code just yet (due to its rolling release system), but I don’t want to support what I consider a defilement of what was once a great operating system.

Rest in peace, Ubuntu.

Secondary Indices in Riak With Erlang

| Comments

For the uninitiated, Riak is a production ready distributed key value store. It is based on Amazon’s Dynamo paper and is faithful in it’s implementation.

Recently, I figured out how to use secondary indices in Riak using the Erlang client and am documenting its usage here for the benefit of future end-users. Secondary indices provide a way to query for sets of bucket/key pairs stored in Riak. The sets are defined by either a value or range of values on a specified index. The index can be attached to any riakc_obj prior to a put operation as metadata. Secondary indices are fast, and the preferable way to group associated data if complex operations don’t need to be performed on the contents of the value (for this purpose, there is map-reduce).

First, you’ll want to change the backend storage of Riak to LevelDB. The default bitcask storage option will complain that secondary indices are not supported. Incidentally, swapping storage solutions can be done at runtime, since Riak itself handles primarily the distribution of data, node failover, and the Dynamo principles.

To do this, find the app.config file in the etc folder of your riak build. If you compiled Riak from source using make rel, this file will be in rel/riak/etc. Find the lines:

1
2
{riak_kv, [
    {storage_backend, riak_kv_bitcask_backend},

and change it to

1
2
{riak_kv, [
    {storage_backend, riak_kv_eleveldb_backend},

Restarting Riak at this point will change the storage backend on the fly. Note that any data you had in bitcask is not gone, but persisted in a different place on the disk (the paths to these locations are elsewhere in the same config file)

At this point, lets create a few sample indexed objects:

1
2
3
4
5
6
7
8
9
10
Obj = riakc_obj:new(<<"employee">>, <<"jeremy">>, <<"engineer">>).
MetaData = dict:store(<<"index">>, [{"age_int", 23}, {"state_bin", "CA"}], dict:new()).
Obj1 = riakc_obj:update_metadata(Obj, MetaData).

Obj2 = riakc_obj:new(<<"employee">>, <<"helena">>, <<"scientist">>).
MetaData1 = dict:store(<<"index">>, [{"age_int", 32}, {"state_bin", "CA"}], dict:new()).
Obj3 = riakc_obj:update_metadata(Obj2, MetaData1).

{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087).
riakc_pb_socket:put(Pid, Obj1).

Here, I have inserted two objects into Riak, each representing a different employee. I have also inserted them with metadata about their age and state of residence. Note that object metadata is represented as a dictionary in Erlang. To query the data, I can do something like:

1
2
riakc_pb_socket:get_index(Pid, <<"employee">>, "age_int", 23)
riakc_pb_socket:get_index(Pid, <<"employee">>, "state_bin", "CA")

The first query will return a list of a single item: [<<"employee">>, <<"jeremy">>]. The second will return a list of two items: [<<"employee">>, <<"jeremy">>] and [<<"employee">>, <<"helena">>]. Note that the field names of the index must have either _bin or _int as a suffix to denote the type of field the index is. At this time, only binary and integer data can be used to index objects.

And that’s a wrap! In your own applications, you might want to consider grabbing an object’s metadata first before updating it so you don’t wipe out existing metadata. If you are still reading at this point, I hope you found this helpful. I would encourage experienced Riak users (or anybody really) to point out any errors I might have made so I can fix them for the good of the community. Thanks for reading.

Unix Utilities You Should Know and Love

| Comments

This post is a brief survey of a number of plugins and utilities I use, in order of descending usage.

  1. TMUX – If you haven’t given this one a try, or you’re still using screen, I highly recommend you try it out. It plays very nicely with VIM too (think Slime for Gnu Screen). Plus, it’s scriptable and configurable.

  2. EMACS – Yes, yes, you’re a hardcore VIM user, and I don’t blame you! I love VIM too. That’s why I use the Evil EMACS plugin and turn on VIM’s normal mode by default. Why EMACS and not just VIM you ask? Indentation, that’s why! After hacking in Haskell for a long time, and now Erlang, the last thing I want to have to manage are indentation levels. With EMACS/Evil, I get the both of both worlds. I map to the VIM escape mode so it doesn’t conflict with EMACS usage of the escape key as the Meta key. I also map to ‘insert j’ in case I do, in fact, need to type a j quickly at some point in time.

  3. xmonad – Also in line with saving brain cycles to do real work, I use xmonad so I never really have to think about where my windows are or how to size them. It just happens. You can configure xmonad to open specific locations on specific desktops, and even configure the sizes of individual panes and such. Best of all, navigating through panes uses default VIM cursor movement motions.

  4. Zsh – Zsh is a polarizing topic and strikes a chord with many BASh enthusiasts. Personally, I find Zsh’s file globbing, smart autocompletion, and more sensible scripting syntax too good to pass up. Plug in Robby Russel’s oh-my-zsh for snazzy additions like git branch autocompletion, sensible aliases, and more.

  5. SOLARIZED – OK, not a plugin but great nonetheless. If you’re going to spend the majority of your time staring at a terminal, the least you could do is make it pretty. Feel free to tack on zsh-syntax-highlighting while you’re at it.

  6. htop – If you do frequent benchmarking, learn htop over top. Has convenient features like killing processes by name instead of PID, visual core and memory usage bars, and more.

  7. irssi – A hacker’s favorite IRC client. If you aren’t already constantly lurking or participating in a relevant IRC channel, you’re probably missing out on a ton of knowledge and interaction with very, very knowledgeable people. Try #math, #physics, #haskell, #erlang, #emacs, #vim, and more on freenode.