Anatomy of a Haskell-based Application, Revisited

By Zhouyu Qian | Email | Twitter | LinkedIn

Published November 13, 2017

Updated November 20, 2017

It has been nearly two years since our former CTO Arnaud Bailly posted his widely read article on Capital Match’s architecture on November 16, 2015. Since then, we have written about our architecture several times, but they were all were intended for an audience of investors or business people, which necessarily means a high concentration of buzzwords, fancy graphics with impressive-looking diagrams and flow charts, but with a dearth of actual technical details. Today, however, I will write about Capital Match’s architecture for a technical audience. I will largely follow the structure of the original post so it helps to read that post first.

I have read Arnaud’s article several times recently and a large part of it has remained valid. This is despite a tremendous growth of Capital Match from a business perspective: the total amount of funded loans was around S$3 million when Arnaud wrote that article, but now it is almost S$60 million—a twentyfold growth in two years. On the backend, the amount of data Capital Match is handling is also growing exponentially:

Figure: Capital Match’s data size is growing exponentially. Note the logarithmic axis.

Figure: Capital Match’s data size is growing exponentially. Note the logarithmic axis.

Perhaps surprisingly, what has not grown by twentyfold is the size of source code. To determine how many lines are added and deleted, I ran the following command on our main repo:

git log --no-merges --since=2015-11-16 --numstat --pretty=tformat: | \
awk '{ add += $1; del += $2 } END { print add, del }'

That command means producing numerical statistics (lines added/deleted) on every non-merge commit since November 16, 2015 and add together the two first columns. The result: we have added 515,287 lines of code and deleted 543,408 lines of code. So Capital Match’s code base is smaller compared with two years ago, despite adding tons of new functionalities, improving security and performance in many areas. I was slightly surprised by this result and ran the following to determine where the additions and deletions happen:

git log --no-renames --no-merges --since=2015-11-16 --numstat --pretty=tformat: | \
awk '
    { add[$3] += $1; del[$3] += $2 } 
END { for (f in add) printf "%6d insertions   %s\n", add[f], f; 
      for (f in del) printf "%6d deletions    %s\n", del[f], f 
    }' | sort -rn

Some major deletions involve removing certain third-party files like react.js from the repo and making them actual dependencies. Other major deletions involve removing some badly written UI code produced by a contractor (yes, outsourcing work like this was a mistake). But most of the deletions are typical of our workflow: at Capital Match, we vigorously delete unused code and simplify existing code.

Eric Lee of Microsoft argued in 2009 that “Source Code Is A Liability, Not An Asset” and similar sentiments can be traced back as early as 2001, when Alan Cooper, the father of Visual Basic, also said “code is not an asset” in an InfoWorld interview (archived from original). Everyone in our team agrees that less code is better code.

Besides deleting code, we also do a lot of refactoring, which explains why we have added half a million lines of code and also deleted half a million lines of code, resulting in relatively high “churn.” We believe that in the face of constant change, ongoing and continuous refactoring is the only way to have a codebase that remains sane.

Fundamental Design Choices

Haskell

Speaking of refactoring, we are happy that we chose Haskell, a wonderful language that enables large refactoring with ease, perhaps even turning what could be unthinkable architectural changes in other languages into pure tedium. When the business side of the company demands a feature with a tight deadline, we sometimes write some pretty bad code just to get things working, and then later on after the feature has been released take the time to rethink and rewrite the code with good software engineering practices. Naturally this is following the “Make it work, make it right, make it fast” paradigm:

  1. Meet the minimum requirements for the business to call project a success. (Make it work.)

  2. Add bells and whistles to make the program less prone to error and more feature rich. (Make it right.)

  3. Find and eliminate waste in the process. Some assumtions from the start will have been incorrect. Remove unecessary business logic. Included in this step is to improve code for better performance. (Make it fast.)

Over time, however, it is easy to succumb to the temptation to make things right at the first time. As a result, there have been cases when we very well could have delivered a feature on time but instead the perfectionist within us took over and the feature was delivered late.


In his famous book JavaScript: The Good Parts, Douglas Crockford started the first chapter with this remark:

When I was a young journeyman programmer, I would learn about every feature of the languages I was using, and I would attempt to use all of those features when I wrote. I suppose it was a way of showing off, and I suppose it worked because I was the guy you went to if you wanted to know how to use a particular feature.

It is perhaps a rite of passage for programmers to realize that not every feature in their language of choice is useful in day-to-day programming. Some Haskell features are just mistakes; others are designed for PL researchers to experiment, but not their usefulness in industrial programming is yet to be proven. For Capital Match, this is a very slow process, but we are gradually getting rid of code that uses language features or advanced patterns that turn out not to be worthwhile. Just like how in The Evolution of a Haskell Programmer Fritz Ruehr pointed out the anti-climatic moment when powerful patterns are used to compute the factorial and Fibonacci numbers, there is a difference between interesting intellectual excursions and simple, maintainable, practical (and mostly value-level) code. As for advanced type-level features, they sometimes eerily remind me of the stereotypical enterprise-y OOP code that cares more about building a superstructure of classes and hierarchies than about getting things done. Sometimes it is easy to forget how joyful it is to write value-level code that is concise, clear, and lucid.

That said, our embrace of Haskell has not been unconditional. We like Haskell, but we are not so delusional as to give up a better tool or library just because it’s written in another language. In those two years we have given up Bake in favor of an off-the-shelf CI server written in Java. Reliability was an issue, but in the end it just does not seem worthwhile for us to spend effort to maintain code for a CI system when we could have used a CI server with a discoverable GUI (and someone else to maintain it).

Event Sourcing

The use of event sourcing is perhaps more controversial than the use of Haskell; even in the Haskell world there are plenty of developers who continue to use conventional RDBMSs. But our CTO had a bias against RDBMSs and to this day we do not use RDBMSs. The way Capital Match uses event sourcing is mostly unchanged: a flat event file stores events, or individual business-level actions that can affect the state of the system. We have, however, greatly improved the implementation of this subsystem, from better ways to load and persist events, to better queries. We’ll talk about these improvements in detail later in this blog post.

General Architecture

The general architecture also largely remained intact. Quoting Arnaud,

The main interface to the system is a REST-like API providing various resources and actions over those resources. Most exchanges with the outside world are done using JSON representation of resources, with some CSV. The User Interface is merely a client of the API and is (morally if not to the letter) a single page application. There is also a command-line client which offers access to the complete API and is used for administrative purpose.

There are three main layers involved in Capital Match’s application: Model, Service, and Web.

Model

The Model layer is full of pure code that carries out the requisite business logic in the app. It contains several BusinessModels that each has their own events and commands. Compared with Arnaud’s original exposition, this part is almost entirely unchanged, save for minor details like changing from data families to injective type families and changing a few argument orders:

class BusinessModel a where
  -- | The type of events for this model.
  type Event a = event | event -> a

  -- | The type of commands for this model.
  type Command a = command | command -> a

  -- | An initial value for this model.
  init :: a
  default init :: Default a => a
  init = def

  -- | Execute a command against this model and returns an event
  -- representing the outcome of the command.
  act :: a -> Command a -> Event a

  -- | Apply an event to the model resulting in a potentially new
  -- model.
  apply :: a -> Event a -> a

Conceptually, a command is an action that an outside request wants to perform, whereas an event is a permanent and immutable record that something has happened. For example there can be a RegisterUser command signaling the desire to register a particular user; in response it may result in UserRegistered recording the fact that the said user has been registered successfully, or another event signaling error. As a result, the act method is a great place to do validation; borrowing the previous example, in act we check whether another user with the same email has already been registered, and other business rules like whether the password meets minimum length/complexity requirements.

One thing we’ve been considering to add is the ability for a single command to emit a list of events. Being a list of events, there can be zero, one, or even more of them. The possibility to have a command produce zero events is very useful: it can be used whenever a command is not really applicable given that state; we often find ourselves writing NoEvent as one of the constructors for our Event type just to capture this possibility. Occasionally it is also useful to have a single command result in multiple events that will be handled atomically. We usually workaround this issue by creating a new constructor in the Event for this complicated operation, even when we know the end result we want is simply the sequential application of two existing events.

Arnaud lamented that the fact that the original rule of strictly separating the business models was broken quite a few times; in the ensuing years we no longer cared about strictly separating those business models. A lesson learned is that if the goal is to strictly separate the business models with an eye towards being able to deploy them as independent services exchange messages, then perhaps we should in fact build multiple executables actually exchanging messages: a rule written in natural language is the easiest kind of rule to break, because there is little enforcement beyond code reviews. Once we have given up this goal of using event-passing as the manner of coordination across business models, it then becomes obvious to use shared-memory concurrency, a.k.a. STM in the Service layer.

Service

The service layer is conceptually a layer that can carry out actions on these models. There are in fact two sublayers. The first one InSTM is a layer where no IO is allowed, but mutation of data is allowed but restricted to be atomic. It can access multiple different business models safely, thanks to STM:

newtype InSTM s a = InSTM
  { unInSTM :: ReaderT (TVar s) STM a
  } deriving (Functor, Applicative, Monad)

Here the InSTM monad allows several modifications to a single TVar to be made atomically even across several business models, as well as ephemeral data that is not part of any business model (e.g. certain kinds of feature flags). It is also useful for complicated data accesses across business models that must be atomic, even though STM is not strictly required in this case.

More useful is when we gain the full functionality of IO. The ServiceT monad transformer looks like this:

newtype ServiceT g l m a = ServiceT
  { unServiceT :: ReaderT (TVar g, l) m a
  } deriving (Functor, Applicative, Monad)

This is essentially the same as the WebStateM type in Arnaud’s blog post. Over time, there have been some efforts to improve it further, such as sneaking in an ExceptT within it, but those turned out not to be worthwhile. The g type variable refers to the global state, which is called CapitalMatchState that stores all the data, provides some pluggable functions, some configuration values, concurrent channels, and “buses” (which are just a TQueue endowed with a worker thread to perform actions on things written to the queue). The l type variable captures local state, which in practice means state associated with a particular request, including certain pieces of data from request headers and some other information computed from request headers, such as authentication info. It is in principle quite versatile and a bit like Python Flask’s magical g object, but we only make limited use of it because really additional state is just additional complexity.

There are also additional instances for those two types, such as MonadState so code can be written to work with both. There are also a few other instances like MonadBaseControl for ServiceT that are occasionally useful.

One should also note that both of these monads are, at their heart, ReaderT monad transformers. This is why a blog post this June by Michael Snoyman on ReaderT feels familiar to us. We have been using pattern for years before the blog post and we fully agree with Michael Snoyman that this is an excellent way to structure a Haskell program.

That said, in practice, we do find that developers have a tendency to overuse ServiceT, or specifically a type synonym for ServiceT we call Service that includes a MonadIO constraint. Because Service is the default monad to reach for, a lot of times even when we just want to perform some pure computation, developers frequently just use Service which has the full power of IO.

Web

The Web layer is conceptually a simple layer that translates from HTTP requests to Service functions, and then translates the result back to HTTP responses. It handles things like middleware, requests, routing, parsing data from JSON representations and serializing result back into JSON.

We are using a mix of Scotty and Servant. Neither are, in my opinion, perfect: Servant is considerably better conceptually, but with terrible error messages and quite a bit of boilerplate, whereas Scotty’s general design (especially error handling and its liberal but inappropriate use of lazy Text) leaves something to be desired.

Physical Organization

In terms of concrete files, we mainly have modules like Capital.X.Model, Capital.X.Service, and Capital.X.Web where X is some name of a business model. Usually we also have Capital.X.Types and/or Capital.X.Types.Y for data types belonging to this model. There are also client-specific types that reside in Capital.X.Web.Types.Z where Z is some client-specific type.

Arnaud in his blog post mentioned splitting code, but he meant splitting into multiple packages. This is sadly misinformed; we have since merged all of our packages that aren’t really independent into a single one containing hundreds of modules. The reason is that, for inexplicable reasons, GHC still does not default to compiling in parallel. Neither does stack invoke GHC in a way that parallelizes the build effectively. Yes the stack tool has a -j option but that only parallelizes stack’s internal invocation of the cabal build process. In other words, it uses package-level parallelism. To have module-level parallelism, it is needed to use stack --ghc-options=-j. It is really surprising how this is often not mentioned when people complain about GHC build times: an obvious remedy is of course to turn on module-level parallelization! In practice we also have several more options passed to GHC, mostly RTS options to control garbage collection for GHC itself which has been shown to improve compile times by up to 50%. Merging all packages back into a single one has demonstrated up to 20% improvement in build times.

Other Tidbits

Persistence & Storage

Although the architecture of the event storage subsystem has not changed, what has changed greatly is the actual implementation: almost every line of code that deals with events and their storage has been rewritten for higher performance. Thanks to GHC’s cheap and easy-to-use sparks, the event loading code runs in parallel. Low-level system calls are also used directly to enable higher performance (mmap-ed files) and better reliability. On the Haskell side, event reading/writing code has also been considerably simplified: the original labyrinth of several TQeueues and TVarss and TMVars have been replaced by a single MVar (which, unlike TMVar, has the fairness guarantee).

On a higher level, however, we are still using TVars, TMVars and buses, which are (as noted above) just TQueues endowed with a worker thread to perform actions on things written to the queue. Although designed to be generic and useful in many contexts, this form of message-passing concurrency is heavily used in higher-level event handling and have seen great improvements, especially in the face of exceptions.

Migrations

On the versioning and migration front, logically we have not changed the main architecture but again the implementation is extremely different. We no longer naïvely use the event version to select a function to deserialize the old data. Our old code of having a function for each version to deserialize the data is terrible:

  1. it has built-in assumptions for how the shape of the old data looks like (which is arguably a fair assumption) and uses partial functions everywhere (which is bad practice);

  2. without modern techniques like lens the migration code uses a lot of highly specific data structure traversal code that is verbose and convoluted;

  3. the fact that those functions deal with historical data and convert them directly to the data the application currently requires means historical migration code sometimes has to be modified when writing a new migration, and therefore in order to write a migration it is necessary to understand every single previous migration since the beginning of time.

The last point is especially insidious: in the beginning the code is quick and easy to write but gradually it became a nightmare. Asking a human to remember the entire history of how our data types look like is just asking for trouble. After onboarding a few new developers who wrote incorrect migration code and incorrect tests for said migration code that will silently reinterpret old data in the wrong way, we finally embarked on a major refactoring. Somewhat scarily, it is discovered along this refactoring process that some older migration code, written in early 2016, has never been correct in the first place; the tests that accompany them have never caught the issue because they did not check for the semantic interpretation of events, only that no exceptions were thrown during migration (because we were using partial functions extensively).

Gone are the days of extremely brittle migration code that is difficult to read, difficult to write, and require frequent changes. We now use a snazzy lens-based migration system: the lens library makes traversals of deeply nested data structures extremely easy, and here at Capital Match we wholeheartedly embrace it. We’ll be sharing technical details of this system in a future blog post.

Queries

Even in the original blog post Arnaud mentioned that SQL is great for writing queries, and this is indeed true. We have mostly completed our transition to IxSets which are simply a set of things together with indices (which are maps of sets). For queries that work within the narrow “sweet spot” of IxSets (which are in fact plenty), it is extremely fast. There is a blog post from 2013 that unscientifically compares the performance of Acid (which is more about ixset than acid-state) and finds it to be extremely fast. This is generally our experience as well. For the code that uses IxSets, we are never really disappointed at its performance.

For queries with complicated predicates (think where clause in SQL), we also have a BooleanQuery data type:

data BooleanQuery a
  = And [BooleanQuery a]
  | Or [BooleanQuery a]
  | Not (BooleanQuery a)
  | Prop a
  | BooleanTrue
  | BooleanFalse
  deriving (Show, Functor, Foldable, Traversable)

There are also several functions that can simplify boolean queries with various options, which not only does simple simplifications like transforming And [BooleanTrue, p] into p, but also proposition-aware simplifiers that look at the proposition in question and performs simplification. Following this simplification and rewriting, there are then functions that turn those queries into actual ixset queries. Some of these functions are not dependent on the specific ixset in question (i.e. a polymorphic function that turns a BooleanQuery a into Endo (IxSet ixs b)) but others are also ixset-specific ones (for example an Or query that operates on the same index type we can use the @+ operator instead of union). Of course, the type also supports non-ixset queries: we have code to turn BooleanQuery a into m Bool where m can be any monad (but usually the reader monad) as well as more complicated values like f (g Bool).

For joins, the story is admittedly not as satisfactory. The containers package introduced map-merging in 2016 and we are using it in a few places, but we would like to see something similar for IxSets: it really isn’t hard to implement. If eventually the naïve way of joining ixsets with linearithmic complexity turns out to be a performance bottleneck worth resolving, we will likely fork it and add support on our own.

User Interface

Very little has changed in this regard. We are still using Om in ClojureScript. That’s not to say, however, we are satisfied, but rather it is perhaps too late to change now. Echoing the sentiment from our inaugural post, the lack of type system here makes frontend development rather unsatisfactory, but still that was a choice made by Arnaud back in 2014 and it is well-known that rewriting large chunks of code isn’t worthwhile.

Besides the lack of a type system, it is gradually discovered that Om itself has some design issues. The first issue, also explained in this blog post from CircleCI, is that Om assumes that the shape of the UI (the DOM, essentially a tree) fits the shape of the data. In real world, this seems frequently not the case, leading to either a component accessing just the entire state, or avoiding the global state (and cursors) and using local state exclusively (peppered with antipatterns like one component accessing the local state of another component, and manually calling om/refresh!).

The second issue is really that there is a distinction between global state and local state at all, and that there are different functions to handle them: to apply a function to the global state, we need to use om/transact! and to set it to a value the function is om/update!; yet to apply a function to local state, the function is om/update-state! and setting it to a value is om/set-state!. Besides the naming inconsistency, the different methods for accessing the global and local states mean that some important basic components have to be available in two forms: one version that works on global state and another that works on local state. Contrast this with reagent, where global state is just an atom at a top-level defined with defonce, and local state is just another atom defined within a let binding and made available in the closure of the render function.

Om Next is a next iteration of Om that is a radically improved way of writing UIs. Unfortunately it has almost reached vaporware status, for having been in development since 2015, with still no actual release. It just reached beta status this year, still with lots of incomplete documentation, and critically, no documented way to gradually migrate from Om to Om Next. Furthermore, it also seems to be difficult to understand, exacerbated by repurposing standard terminology like “parsing” to mean entirely different things.

We’ve also been watching this space for a more Haskell-y solution, from PureScript to GHCJS. Both have made great strides since two years ago, and it is not inconceivable that starting from scratch we would choose PureScript or GHCJS.

Concurrency

As noted by Arnaud, concurrency is mostly handled at the level of requests. For other scenarios, we have fully moved away from forkIO in favor of higher-level abstractions like async and withAsync. Among these two, the former is sparingly used in favor of the latter for automatic killing of children threads when the function returns or throws an exception. In practice, we usually use the ContT r IO monad for ease of handling the continuation-based nature of withAsync, especially in app startup code when we need to launch many threads.

As for coordination between the concurrent threads, a mix of messaging passing and shared memory is used.

Conclusion

This is, once again, a long post but only a brief overview of the system. There are a few pointers mentioned above that I’d like to elaborate further in future blog posts, as well as covering what has changed in other parts of the system like infrastructure, CI/CD, testing, containerization, development environment and flow, and project management and coordination with a team in four countries across three timezones. Some of these would be similar to how other teams do things, but I hope some will be unique and insightful.