Goodbye Quark, Hello World
I gave my notice to Quark Games Inc almost two Fridays ago with mixed feelings. While there, I had many opportunities to achieve, learn, contribute, and teach. I am unquestionably a better engineer now than I was a year ago, and I have Quark to thank for that, however, I am also optimistic about what the future holds. The post seeks to document what I’ve done in the recent past and discuss where I want to go in the future.
Cassandra Hadoop Programming
My first task as an intern was to work on a BI system built on Cassandra (using DataStax as the provider). I wrote a number of Pig/Hive map reduce jobs and because familiarized with the map reduce framework (thanks Alex Tang for spearheading this and showing me the MR ropes). I also leveraged preexisting knowledge to write a number of R scripts to generate cool visuals. Unfortunately, the system proved a bit expensive to use due to the overwhelming volume of data and inability to sample the data effectively.
Valor guild request/invite system
This was my first server feature on the Rails web server. Many thanks to Albert Wang (a product manager at the time) for helping me with my questions and familiarizing me with the codebase. Also thanks to Eric Liaw, Tyler Margison, and Juan Delgado for answering my questions when I was pestering Albert to much.
Implement a Git Rebase Workflow
Code reviews were happening at a snail’s pace due to over the shoulder reviewing. I introduced the Pull Request system (actually leveraging the distributed portion of Git as a DVCS) and the git rebase workflow. The team was very receptive and quick on the uptake.
Write Deploy Scripts for Valor
I was a fledgling at automation at the time but I knew that things should be more automated than they were. Because Valor existed across many deployed environments, we needed a way to manage and know, a priori, what code was deployed where. I began to learn more about the deployment philosophies of Engine Yard and Chef. Again, the team was quick to adopt the new methodology and there was a bit more sanity in the QA process as a result.
I implemented Valor competitions shortly afterward. This was more or less my last task as an intern before I was converted to full time as a Senior Engineer. As a feature, competitions were not very difficult but taught me a lot about the product feature cycle. Many things were pushed to last minute and compromised the timeline as a result. I also learned (again) how hard automation was. Scoring a competition in particular, was a batch job requiring tens of thousands of database queries. Writing this in a way that it could finish without f*cking everything up was quite challenging.
Map Data Encoding
Everything was JSON, and considering that the bulk of our data was
small single digit numbers, you can imagine the size of our request
and response bodies. Ginormous! Our company hosted an awesome
hackathon (thanks Carol Liaw!) at which time I had an idea to encode
our map data as binary to allow players to potentially zoom out and in
to maps. The most elegant way I could think of to encode the data was
in a lossless PNG, mainly because the
(x, y) coordinates of the
world mapped well to each pixel, and I calculated that I could store
all the data I needed in four 8-bit channels. Plus, you could easily
drop in an existing PNG library to query pixel colors (a fast
operation). The prototype was completed thanks to Calvin Hsu, my
partner in crime and it eventually found its way into production
several months later (I had moved on to a different team by that
Prototype Valor in Erlang
Valor was plagued by scalability issues and one of my thoughts was to rewrite some of the core game logic. At the time (and still), the core logic used lazy evaluation in Ruby and required a number of lookups to succeed. This was awful for user response time, and also server costs. My main thought was to write the heavy logic in a more performant language that made it easy to evaluate things eagerly. This was my first exposure to Erlang, although I had dealt with functional languages before (Haskell, R, Lisp, ML, etc). I actually evaluated a number of other solutions for the task. The top contenders were Clojure, Scala, Java, and C++. Of these, Erlang seemed like the one that could produce results, even for a fast prototype. I managed to build a system that would easily handle the load, with no exceptions or errors for a prolonged period of time (8 hours for my first test) on a single machine!
Alas, I never got to implement this system in the wild, but the experience was valuable nonetheless. Instead of optimizing the game logic, a decision was made to optimize the database layer instead (replacing memcached instances with Couchbase). Couchbase would prove to be a horrible choice; more on this later.
Port Valor from Rails 2.3.5 (Ruby 1.8.7) to Rails 3.2.12 (Ruby 1.9.3)
This was done from a Friday morning to Monday morning. I saw the sunrise on Sunday. The change was over 10000 lines of pure hell, filled with gem incompatibilities, library changes, API changes, the works. I couldn’t even manage to get the application to boot until Saturday evening (this is after 22 hours of work). I had also worked fast on Thursday to finish things early so I could work on this “side project” on Friday without anybody missing a beat. Fortunately, things worked and the code itself was deployed a week and a half later. I was fairly pleased with this accomplishment because it is one of the few things I started that I had no idea if it would work out or not. I was fully prepared however, to utterly fail by Monday in exchange for what would hopefully be an interesting learning experience.
What did I learn? Well, first and foremost, keep up to date with the latest and greatest if you can. Having your stuff break early and often is definitely the way to go. Also, tests are great. Why? Because the codebase at the time, had ZERO tests which made making sweeping changes a nightmare. Incidentally, tests were something I began pushing for around this time. Finally, I learned to read the Rails source code. I’ve more or less sworn off from using Rails again, but understanding how a large framework was built would prove useful when it came time for me to build it myself.
Many thanks to Eric Liaw for helping me see through this ugrade. Also, thanks to Tyler Margison for helping me catch basically the only bug that happened on production (old Memcached keys were not evicted and since the namespacing rules on the client had changed, the Memcached instance was perpetually OOM).
Prototype New Games in Unity
The Rails 2 -> 3 conversion was the last thing I did for Valor. Afterwards, I was part of a 3 person team (Calvin Hsu and Timmie Wong) to prototype our next game. Coming from server land, it was fun to program on the client for once. I learned CSharp, and also why people love it so much. More importantly, I learned Unity and experienced using a game engine in a professional setting. Sure I had dabbled with Unreal, Flixel, Flashpunk, and more before, but it was different when I had other people to work with and an actual concept I was going for. Most importantly, it really solidified in my head things that I liked in a game engine, things I disliked, and things I really wanted (that didn’t exist). When I build a game engine in the future (not a question of if, when), this will definitely come in handy.
Finally, I architected Champs. Taking everything that I had learned from Valor, I changed almost everything. Champs was built to have:
- Stateful requests/responses
- In-application caching (easy invalidation)
- Socket based communication (made presence/chat and more easy drop ins)
- Compact protocols (initially JSON for prototyping purpose but changed to Protocol Buffers)
- Full test coverage (integration and unit tests, albeit not comprehensive)
- Low response times (application response time means are measured in tens of microseconds)
- Security (SSL based encryption, bcrypt encoded passwords and tokens, session invalidation, obscured API, single session logins, email validation, etc.)
- Smart modularity and well organized code
- A CAP theorem-first approach to mitigating race conditions and ensuring consistency where consistency was important, and availability when availability was important.
- Live configuration updates
- Code hot swapping
- Efficient and inexpensive to run (part of this is just avoiding EC2 like the plague)
On the deployment side, I learned Chef and created a working deployment in about a week’s time with:
- Aggressive Linux tuning for security and performance
- Comprehensive firewall rules
- A sound failover and load balancing strategy
In the course of writing and designing Champs, I got to author a few open source libraries as well (a websocket client, and an Erlang protocol buffer library). All of this was done over a period of 9.5 months. Many thanks to John O' Connor who joined me for the last 4.5 of those months and picked things up quickly. Also, shoutouts to Kevin Burg who made pivotal contributions in the summer alone. Many thanks also to the rest of the Champs Engineering team (Harrison Chow, Kevin Xiao, Calvin Hsu, and Timmie Wong) for being awesome to work with and putting up with my mild tantrums when some open source library I found was broken.
While at Quark, I was allowed to attend Ricon West 2012 where I was exposed to many of the fundamentals of the CAP theorem and logical monotonicity. This conference has shaped my approach to distributed systems a great deal and am greatful for the opportunity. I am fortunate to be speaking at this year’s Ricon West, although I will not be doing it on the behalf of Quark. I expect I will learn a lot as before, and hope to contribute some back myself.
I also gave my first conference talk at the Erlang factory in SF in March. This was a rather rattling experience but I found it valuable to bounce ideas off the other speakers. I learned a lot at this conference as well and look forward to seeing the inevitable spike in Erlang usage in the future (or Elixir).
This is a sort of glop of stuff I think I learned over the last year, although it’s impossible to really distill it down to one (or even a hundred) blog posts.
- Identify CAP theorem tradeoffs early
- Cache invalidation really is one of the hardest problems you’ll face
- If a database company tells you they solve all of your problems for every use case, they are lying
- If the database company still tries to convince you, move on
- HTTP request bodies are humongous and expensive
- Text protocols are useful ONLY if you intend on viewing them a lot and typing them by hand (e.g. a configuration file).
- You aren’t done when the code is checked in. You’re done when live users are using your code and it works
- Logging and monitoring are always mandatory
- If you automate everything you think you need to automate, you haven’t automated enough
- Putting off automation is almost always the wrong decision and inefficient in the long term (and usually even the short term)
- Having a layer of separation between Engineering and Product is good when time is critical
- Crunching long hours makes boring work unbearable. If the work is fun, you’re never crunching.
- Interviewing is a very very hard problem. Getting the wrong gauge on somebody is the norm.
To all the individuals I’ve worked with, it’s been a great ride and I’m sorry to go but there are things I want to do. The system is hopefully built to last and I hope you keep in touch.
Starting next week (although I’ve gotten a small headstart), I will be working on a rendering engine with Calvin Hsu, an awesome friend + engineer (and ex-Quark employee). There is still a lot I don’t know, and I expect the experience will be extremely humbling. This is what I ultimately want, however. Perpetual challenges that make me question my abilities and strive to learn and grow more as a mathematician, scientist, and programmer. More importantly, the forefront of technology is still just beyond me, and I want to do everything I can to get there and pierce the veil and venture into the void beyond.