Cameron Gera and Taylor Fausak produced a podcast on an article of mine about good design and typesafety using code from an implementation of the game Yahtzee. The article is about refactoring code to improve design and how that goes hand-in-hand with type safety. Intriguingly, listening to others talk about my article gave me fresh ideas.
At one point in the article we observed that a variable to an argument was unused. Subsequently we removed it. The only justification given that the removal of the argument was valid was that we could convince ourselves that it was unused by looking at the implementation of the function, and that we could insert a run time check.
Hearing Cameron and Taylor talk about the article made me think again. There were only two changes to the code that really relied upon understanding what it does; everything else was mechanical transformation. Both of those changes were to do with the unused argument.
Einstein said “chalk is cheaper than grey matter”. Can we avoid using “grey matter” (our brains) to remove the unused argument, instead just relying on “chalk” (mechanical transformations)? The answer is yes! Let’s see how to do it.
We start from the “Add
pop function” stage of the previous article. We’ve got a suspicion that, although the argument
allRolls is used, the argument
n-1 to the recursive call is not. How can we transform the code to make that clear?
type DiceChoice = [ Bool ] type DiceVals = [ Integer ] type DiceState = (DiceVals, Integer) pop :: DiceChoice -> DiceVals -> Maybe ((Bool, Integer), (DiceChoice, DiceVals)) pop   = Nothing pop (chosen:choices) (v:vs) = Just ((chosen, v), (choices, vs)) pop (_:_)  = error "Invariant violated: missing val" pop  (_:_) = error "Invariant violated: missing choice" allRolls :: DiceChoice -> DiceState -> [ DiceState ] allRolls choices (vs, n) = case pop choices vs of Nothing -> [ (, n-1) ] Just ((chosen, v), (choices, vs)) -> allRolls choices (vs, n-1) >>= \(roll,_) -> [ (d:roll, n-1) | d <- rollList ] where rollList = if chosen then [v] else [ 1..6 ] example = let diceChoices = [ False, True, True, False, False ] diceVals = [ 6, 4, 4, 3, 1 ] in mapM_ print $ allRolls diceChoices (diceVals, 2)
Let’s immediately simplify by using
do-notation. In the previous article we left this stage until later but given that the recursive call is currently part of a
>>= expression let’s apply the simplification now.
If the argument
n were not used at all then our job would be much easier. However, it is used, so let’s try to separate the place it is used from the place where (we believe) it is not used.
Where is it used? We can see that each branch of the
case statement returns a list of tuples where the second element of each tuple is
n-1. Put another way, each branch produces a list and then maps the “pair with
n-1” function over it.
I’ll write the “pair with
n-1” function as
(, n-1) (using the
TupleSections extension). The usual alternative would be to write
\x -> (x, n-1) but in this article I want to keep things compact.
allRolls :: DiceChoice -> DiceState -> [ DiceState ] allRolls choices (vs, n) = case pop choices vs of - Nothing -> [ (, n-1) ] + Nothing -> fmap (, n-1) [  ] Just ((chosen, v), (choices, vs)) -> do (roll, _) <- allRolls choices (vs, n-1) - [ (d:roll, n-1) | d <- rollList ] + fmap (, n-1) [ d:roll | d <- rollList ]
n is used, we think, just twice, and in each case to map the “pair with
n-1” function over a list. We’ve made this duplication obvious but we can’t yet remove it. First we have to lift the
fmap outside the
do. We use the rule that
can be rewritten to
Why is this rewriting valid? Informally, a
do block is like a procedure, and this rule says that “applying
f and then returning from the procedure” is the same as “returning from the procedure and then applying
f”. Formally, it can be proved using the monad laws.
This is how the rewriting applies in our case:
- Just ((chosen, v), (choices, vs)) -> do + Just ((chosen, v), (choices, vs)) -> fmap (, n-1) $ do (roll, _) <- allRolls choices (vs, n-1) - fmap (, n-1) [ d:roll | d <- rollList ] + [ d:roll | d <- rollList ]
Now that both branches of the
case statement are
fmap (, n-1) of something, we can apply
fmap (, n-1) to the overall
case statement instead. Specifically, the rule is that we can rewrite
which in our code leads to
-allRolls choices (vs, n) = case pop choices vs of - Nothing -> fmap (, n-1) [  ] - Just ((chosen, v), (choices, vs)) -> fmap (, n-1) $ do +allRolls choices (vs, n) = fmap (, n-1) $ case pop choices vs of + Nothing -> [  ] + Just ((chosen, v), (choices, vs)) -> do
We want to carefully separate the parts of the code for which the value of
n matters from the parts of the code for which the value of
n does not matter. To this end we split the body of
allRolls into a separate function called
allRolls :: DiceChoice -> DiceState -> [ DiceState ] -allRolls choices (vs, n) = fmap (, n-1) $ case pop choices vs of +allRolls choices (vs, n) = fmap (, n-1) $ allRollsBody choices (vs, n) + +allRollsBody :: DiceChoice -> DiceState -> [ DiceVals ] +allRollsBody choices (vs, n) = case pop choices vs of
Now we are in the nice situation that, although we are yet to prove it to our satisfaction, the value of
allRollsBody does not depend on its argument
However, we’ve ended up with a pair of mutually recursive functions! That’s somewhat unusual. In order to make
allRollsBody recurse only on itself we substitute the definition of
allRolls back into
allRollsBody. Additionally, that makes
allRolls not recursive at all.
- (roll, _) <- allRolls choices (vs, n-1) + (roll, _) <- fmap (, n-1) $ allRollsBody choices (vs, n-1)
We’re pairing every element of a list with
n-1 and then immediately removing it. Let’s just avoid the pairing in the first place.
- (roll, _) <- fmap (, n-1) $ allRollsBody choices (vs, n-1) + roll <- allRollsBody choices (vs, n-1)
Now the magic happens! Our code currently looks like this.
allRolls :: DiceChoice -> DiceState -> [ DiceState ] allRolls choices (vs, n) = fmap (, n-1) $ allRollsBody choices (vs, n) allRollsBody :: DiceChoice -> DiceState -> [ DiceVals ] allRollsBody choices (vs, n) = case pop choices vs of Nothing -> [  ] Just ((chosen, v), (choices, vs)) -> do roll <- allRollsBody choices (vs, n-1) [ d:roll | d <- rollList ] where rollList = if chosen then [v] else [ 1..6 ]
Previously we had to use our brains to spot that the
Integer argument to the recursive call was unused. We inserted a run time check to convince ourselves that we were right. Now we have split the original function into two, only one of which contains a recursive call. We can see clearly that the only use for the argument
allRollsBody is to be modified and passed to the recursive call. The value of that argument is never used in any other way. From that observation alone we are probably satisfied that we can remove it.
In fact we can go one step better. We can make a small change to our code so that we do not even have to inspect the implementation to know that
n is unused. The compiler will check the property for us! However, the check is demonstrated in a strange way, and if you’re not familiar with it then it will look utterly bizarre.
We generalise the type signature so that the function doesn’t just work for an
Integer but rather for any type of numeric argument
a, that is, any type
a with an instance of the
Num type class.
Believe it or not, from this type signature alone, without knowing anything about the implementation, we can conclude that the
a argument is not used! How on earth can we conclude that? It’s because of the parametricity property enjoyed by Haskell’s type system. Basically, the type signature says that the only operations involving type
allRollsBody can use are the ones from the
Num type class. Looking at them, we see that they give us a way to make new
as from an
fromInteger) and ways to combine
as to give other
*, etc.). On the other hand, there is no way that a value of another type can be created from an
a. Therefore, the only way that an argument of type
a could be used to affect the result is if the type variable
a appears in the type of the result. The result has type
[DiceVals] so it cannot be affected by the argument of type
If that seems baffling to you, do not be despondent. Although parametricity is an extremely sophisticated property using it in practice becomes second nature. Haskell programmers use it to great advantage in creating APIs which remain flexible whilst providing strong guarantees via their type signatures.
One way or another, our program transformations have taken us to a place where we feel comfortable removing the
Integer argument without having to think too hard about the justification. We can inspect the body and see that the argument is not used in a way that can affect the result or we can use parametricity to deduce the same thing. Either way we can remove it.
allRolls :: DiceChoice -> DiceState -> [ DiceState ] allRolls choices (vs, n) = fmap (, n-1) $ allRollsBody choices vs allRollsBody :: DiceChoice -> DiceVals -> [ DiceVals ] allRollsBody choices vs = case pop choices vs of Nothing -> [  ] Just ((chosen, v), (choices, vs)) -> do roll <- allRollsBody choices vs [ d:roll | d <- rollList ] where rollList = if chosen then [v] else [ 1..6 ]
In the earlier article I said that “the only way I can suggest that one discovers [that the argument is unused] is to think through how the the code actually works … this is not just a simple refactoring”. I was wrong! There is a small sequence of simple transformations that improve the code whilst at the same time taking us to a place where we easily see that the argument is unused. For the latter either we use a small amount of brainpower to inspect the implementation or we take advantage of parametricity. This is great! We want to save as much brainpower as possible for the really hard problems.
A small amount of Haskell knowledge was required but that is because the code is written in Haskell. Other languages will have their own particular constructs and equivalent transformations, although if they lack
case statements and expression-valued blocks the transformations might appear a bit more clunky. The only typed-functional-language-specific concept in this article is parametricity. In this small example we were happy to just inspect the body of the function. Parametricity really shines in more complicated codebases where unrelated, opaque, pieces of functionality are being combined.