Since the very beginning, we have always used shake. The first commit that introduces shake was back in November 2014. Throughout the years, our build system went through one major refactoring in 2016, when we successfully used Shake’s ability to keep track of non-file dependencies to keep track of our Docker images and containers.
Usually we upgrade to the latest version of Shake as it appears in Stackage snapshots, but the release of Shake 0.16 included revised rule definitions. Since we were already using Shake’s custom rules, this meant that we needed to refactor a large swath of build system code. We had postponed this task for several months, until recently we finally decided to upgrade to the latest version of Shake. We also took advantage of this large refactoring to use more modern Docker idioms such as named volumes, which are much nicer to work with then named containers containing volumes.
This is how our rule definitions looked like before the upgrade:
newtype ImageName =
ImageName String
deriving (Show, Typeable, Eq, Hashable, Binary, NFData)
newtype ImageId =
ImageId String
deriving (Show, Typeable, Eq, Hashable, Binary, NFData)
instance Rule ImageName ImageId where
storedValue _ (ImageName name) = do
(Exit c, Stdout (head . lines -> iid)) <-
cmd "docker" ["inspect", "--format", "{{.Id}}", "--type", "image", name]
return $
case c of
ExitFailure _ -> Nothing
ExitSuccess -> Just $ ImageId iid
This basically means we are declaring two new types, ImageName
and ImageId
. The ImageName
type is a wrapper for a human-readable image name such as ubuntu:xenial
, and ImageId
is Docker’s image ID, which is a SHA256 hash. This is a great way to keep track of what precisely is depended on. For example, if I pull ubuntu:xenial
today, and got back a different image than the ubuntu:xenial
I already had, then all images that are based on ubuntu:xenial
should be rebuilt, even if the actual files needed to build these images did not change in any way.
After the upgrade of Shake to 0.16, the Rule
class is gone, and it is replaced by a type instance declaration:
type instance RuleResult ImageName = ImageId
In the new shake, there are two different kinds of rules, builtin rules and user rules. I’m not certain the terminology is the best, but one way we’ve employed the differences between the builtin rules and user rules is that there is only one generic builtin rule to build images that applies to all images, and then a different user rule to build each individual image. That sounds like a mouthful, but here is the final API:
-- | Add an image rule. Specify the name of the image and how to build it.
addImageRule :: ImageName -> Action () -> Rules ()
-- | Depend on an image.
needImages :: [ImageName] -> Action ()
-- | Add a built-in rule so that Shake understands our custom rules.
addBuiltinImageRule :: Rules ()
The addImageRule
function takes a specific image name, and the Action
to build it. The result is Rules ()
. This adds a user rule. On the other hand, addBuiltinImageRule
has type Rules ()
, and it is meant to be used just once so that the Shake build system knows how to invoke the user rules added above. Finally, we have a needImages
function, just like need
, to express dependencies in a current action. Only these three functions are exposed to the writer of rules.
Here’s how they can be used. For example to say that the ubuntu:xenial
image can be built by pulling, we just need to write
addImageRule (ImageName "ubuntu:xenial") $ do
alwaysRerun
cmd_ "docker" [ "pull", "ubuntu:xenial" ]
Of course, the attentive read would find that ubuntu:xenial
is repeated twice in the above code snippet. Here’s where the strength of Shake comes in; everything is just regular Haskell code so we can use all of Haskell’s abstraction facilities. For example we can write
buildImageByPulling :: ImageName -> Rules ()
buildImageByPulling im@(ImageName imageName) =
addImageRule im $ do
alwaysRerun
cmd_ "docker" [ "pull", imageName ]
Using needImages
is just as simple as using need
. For example we might have a Haskell executable that should be built within a custom Docker image, say, the capitalmatch/ghc-clojure:latest
image (this is public, although we don’t really expect it to be widely used outside). This can be accomplished by saying
needImages [ImageName "capitalmatch/ghc-clojure:latest"]
Now let’s look at how these three functions can be implemented. We present the following implementation.
import qualified Data.ByteString.Char8 as B
data ImageRule = ImageRule ImageName (Action ())
addImageRule :: ImageName -> Action () -> Rules ()
addImageRule im act = addUserRule (ImageRule im act)
needImages :: [ImageName] -> Action ()
needImages = void . apply
addBuiltinImageRule :: Rules ()
addBuiltinImageRule = addBuiltinRule noLint run
where
run :: BuiltinRun ImageName ImageId
run (ImageName im) mold depsChanged = do
mnow <- findCurrent
case (depsChanged, mnow, mold) of
(False, Just now, Just old)
| B.unpack old == now ->
pure $ RunResult ChangedNothing (B.pack now) (ImageId now)
_ -> do
rules <- getUserRules
case userRuleMatch
rules
(\(ImageRule (ImageName im') act) ->
if im == im'
then Just act
else Nothing) of
[] -> fail $ "No rule to build image " ++ im ++ "."
(_:_:_) -> fail $ "Multiple rules defined to build image " ++ im ++ "."
[r] -> do
r
mnew <- findCurrent
case (mnew, mold) of
(Just new, Just old)
| B.pack new == old ->
pure
(RunResult ChangedRecomputeSame (B.pack new) (ImageId new))
(Just new, _) ->
pure
(RunResult ChangedRecomputeDiff (B.pack new) (ImageId new))
(Nothing, _) ->
fail
"The user-provided rule for the image completed but no image was produced."
where
findCurrent = do
(Exit c, Stdout (head . lines -> iid)) <-
quietly $ cmd
"docker"
["inspect", "--format", "{{.Id}}", "--type", "image", im]
case c of
ExitFailure _ -> pure Nothing
ExitSuccess -> pure (Just iid)
The addImageRule
simply packages up the two arguments in a data type called ImageRule
and calls addUserRule
so we can retrieve it later. The needImages
function is also simple; it essentially calls apply
which is a more general way of saying the current action depends on something else. To compare, the Shake-provided need
function essentially parses the pattern specified as a string and then calls apply
.
The addBuiltinImageRule
is by far the most complicated. It calls addBuiltinRule
telling it how to run linting for the rule and how to run it. Since we don’t do linting, we specify noLint
. The run
function is given the name of the image we want to build, the (possibly non-existent) old value, and whether the dependencies have changed. It short circuits the rest of the processing if neither the image ID nor the dependencies have changed. Otherwise, it calls getUserRules
to get all the user rules, which are added by addImageRule
. It tries to match the rules by using ==
on the image name. It then fails the action if no rules can be found or multiple rules can be found. Otherwise it performs the action, then extracts the image ID again. It then returns either ChangedRecomputeDiff
or ChangedRecomputeSame
depending on whether or not the image changed in the build.
Looking back, the experience of upgrading Shake didn’t turn out to be as difficult as I had once thought. This exercise pretty much reinforced what I had already known about using Haskell industrially:
After the large refactoring and fixing compile errors, I ran it on a test machine. It ran correctly1 the first time. This is a again another testament to the power of using types correctly.
But the process of figuring out how to write the code is not easy. Reading the API documentation is not sufficient. Referring to the source code to see how things are done is pretty frequent. This is a common problem across many different Haskell packages. In this specific case, Shake does have some improvements to the documentation already merged to master, but as of this time, it is not yet available in any released version of Shake. Example code helps not just newbies, but also relatively experienced Haskellers by saving the time of reading through the source code.
In the end, however, Shake is an excellent library, and we hope it could be used in more build systems when appropriate.
“Correctly” in the sense that it works on our CI infrastructure. It is still possible that it contains latent bugs whose effects are not seen.↩