Implementing The Raft Consensus Algorithm

I recently attended a week-long class on the Raft Consensus Algorithm. The class was taught by the amazing David Beazley, and was held in his office in Chicago. A goal of the class is for the students to implement Raft.

From the Raft home page: "Consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final". This sounds easy enough, but the cluster of servers needs to survive computer failures, complete network failures, network isolation ("split-brain"), and a host (dare I say "raft"!) of other problems. And these problems can happen at the least opportune time, and in very odd sequences.

The class had 7 students, ranging in experience from a college student to Python veterans. It was a good mix of students, and everyone brought an interesting perspective to the class. I think the finance was the most well represented industry among the students.

The general format of the class was that David would present a topic and a sample exercise. These exercises started off very simple, and gradually got more involved and built on each other. After David's discussion, we'd all work on programming our solutions for a few hours. There were plenty of discussions among the students and with David as we were working. By about the third day we were implementing Raft itself.

Raft is an excellent project for a class like this. It's relatively easy to explain: the entire Raft paper is only 18 pages long. But it involves several complex subjects: concurrency, networking, data serialization, state management, and plenty of per-client and per-server bookkeeping. Parts of the paper are of the "exercise left for the reader" variety, but other parts must be read with language lawyer specificity.

I did pair programming with Larry Hastings, a fellow Python core developer and friend of many years who I met through the Python community. This was my first time pair programming, and I thought it went well, although we probably annoyed the other students when they were trying to concentrate. Sorry, guys! Sometimes we were doing classic pair programming, where one person would drive the keyboard and the other would watch and comment. At other times we'd reach a point where we both had work to do on different parts of the program, and I'd work on the asyncio parts and Larry would work on the Raft state machine parts at the same time. The other students worked independently.

The students chose their own implementation language. I think Python and C++ were the ones used in our class, with most choosing Python. For the approach to concurrency, the group was about evenly split on threads versus asyncio. Larry and I chose asyncio, mostly because I was pushing to learn it, and I thought that being single-threaded would simplify our solution. Being single-threaded would eliminate an entire class of concurrency bugs, or so I thought. While I did learn plenty about asyncio, I'm not convinced our solution was any simpler than "just" using threads. Any thread-based solution that used only queues to communicate would also avoid the classic problems of threads stepping on each other's data. And a queue would be the only thread synchronization that would be required: no locks, semaphores, etc. So in retrospect I might use threads if I were to do it again.

One thing I'm glad I implemented early on was an interactive console for our Raft server, using the aioconsole library. This allowed us to inspect and alter the server while it was running. We could inspect the Raft state, the network communication, etc. This ended up being a powerful debugging tool.

One thing David stressed early on in the course is: how are we going to test our code? I admit I'd been thinking about this as I prepared for the class, and I did not have a good solution. One approach David suggested was to strip out all knowledge of the network and concurrency from the core of the server state machine. Well, he probably didn't go that far, but that's the approach Larry and I took. I think we succeeded at this: we had a small (too small!) test script that could throw network and timer events at the server, and then inspect how it reacted. Like the real world, at the end of the project we were skipping tests and just working on the "production" code. I wish we'd had more time to flesh out our testing framework. It's one of the things I plan on working on in the weeks after the class.

One advantage to removing the concurrency pieces from the Raft server itself is that it should technically be possible to implement a driver program that uses threads instead of asyncio. I originally thought we'd have time to do this, but we ran out of time to do this. This is another topic I'd like to work on after the class. We'd need some sort of lock to prevent multiple threads from being in the server state machine simultaneously. Or at the very least the server would need to protect some of its internal state with locks.

Another takeaway from the class is that I'm warming up to Python type checking. I made at least one error that a type checker would have caught, and that bug cost me at least 4 hours of debugging, and probably more. I think Larry also had a similar bug. And since the vast majority of our data structures were dataclasses, most of the important type checking information was already present. At the very least, all the type declarations that were needed to catch my time-consuming bug were already there.

So, why did I take this class, and what did I get out of it? True Blade's work in the physical security space has led us to working with software that's based on Kubernetes, and Kubernetes uses Raft under the hood (via etcd) to agree on the cluster state. I think it's important to understand the software the True Blade is recommending and installing. So in that regard the class was a success. One key takeaway: Raft clients need to talk to the cluster leader to perform any mutating operations.

Additionally, taking a class taught by David Beazley was definitely a draw, and a huge factor in taking this particular class. I've seen David speak many times at PyCon conferences. He has a unique ability to teach complex subject just by writing simple code on the command line. His talk on Python Concurrency from the Ground Up is a classic, and should be watched by everyone using asyncio. And finally, a chance to work closely with Larry also entered into my thinking: I always enjoy our conversations.

In the end, Larry and I succeeded in implementing a Raft server. It no doubt has bugs, and is probably missing some key features. But the basics are all present, and it was amazing to watch the state propagate across the cluster, and to watch the cluster survive killing off a server or two. One of the most interesting things to watch is to start all of the servers, load it with some data, and then one at a time kill each server and bring it back up. At the end, you will have killed every server, and yet the data in the cluster survived. Somehow I found this the most interesting part of the implementation.

I'd highly recommend this class to anyone who wants to learn more about concurrency, network protocols, and distributed consensus. David doesn't have any immediate plans to offer another in-person Raft class, but he does have an upcoming virtual class.