I accidentally introduced exponential run time into a compiler I was writing and did not realise for over three years.

Once upon a time I was writing an AST to represent expressions in a programming language. In reality the language was SQL, but for the sake accessibility let’s pretend it was a simple arithmetic language that allows us to write zero, one, and (non-empty) sums and products.

```
data Expr a = Zero a
| One
| Sum (NonEmpty (Expr a))
| Prod (NonEmpty (Expr a))
```

The reason why `Zero`

has an argument will be explained in due course.
First, let’s implement wrapper functions to provide a nice API and
write an example expression.

```
zero :: Expr ()
= Zero ()
zero
(.+) :: Expr a -> Expr a -> Expr a
.+ e2 = Sum (pure e1 <> pure e2)
e1
(.*) :: Expr a -> Expr a -> Expr a
.* e2 = Prod (pure e1 <> pure e2)
e1
example :: Expr ()
= (One .+ zero .+ zero .+ One) .* (zero .+ One .+ One) example
```

We expect the example expression to evaluate to `2 * 2`

, i.e. `4`

.
Let’s write an evaluator and check. The evaluator will be recursive
on the structure of terms of type `Expr`

. We are good functional
programmers so we will write a data type and general fold function to
encapsulate the recursion pattern. They basically write themselves,
given the definition of `Expr`

.

```
data ExprFold a r = ExprFold
zeroF :: a -> r
{ oneF :: r
, sumF :: NonEmpty r -> r
, prodF :: NonEmpty r -> r
,
}
exprFold :: ExprFold a r -> Expr a -> r
= \case
exprFold f Zero a -> zeroF f a
One -> oneF f
Prod es -> prodF f (fmap (exprFold f) es)
Sum es -> sumF f (fmap (exprFold f) es)
```

Now it’s simple, almost tautological, to write an evaluator to interpret terms in the integers.

```
int :: Expr a -> Int
= exprFold ExprFold
int = const 0
{ zeroF = 1
, oneF = sum
, sumF = product
, prodF }
```

```
> int example
4
```

Now I’ll explain the reason for the argument to the `Zero`

constructor. The backend that we are targeting doesn’t have a very
good concept of `Zero`

! We can happily send `One`

to it but sending
`Zero`

is quite tricky. In reality the backend is SQL and it’s not
terribly straightforward to generically represent an empty query in
SQL.

Instead of dealing with that problem directly, my approach was to give
`Expr`

a type parameter. Then expressions of type `Expr ()`

can
contain `Zero`

and expressions of type `Expr Void`

cannot.

Expressions that might contain `Zero`

need to have it removed before
the query is sent to the backend. The following function removes all
occurrences of `Zero`

from an expression. If the expression itself
was actually zero it returns `Nothing`

, otherwise it returns a `Just`

.
Because the type parameter in the return type is polymorphic we have a
static guarantee that the result doesn’t contain any `Zero`

s!

```
-- Just an expression, or Nothing. Nothing means that the value
-- of the expression is zero.
removeZero :: Expr unit -> Maybe (Expr void)
= exprFold ExprFold
removeZero = const Nothing
{ zeroF -- ^ Zero is converted to Nothing
= pure One
, oneF -- ^ One does not need to change
= \es ->
, sumF Sum <$> nonEmpty (catMaybes (toList es))
-- ^ We filter all zeros out of the Sum. Terms that are zero
-- (Nothing) are removed by catMaybes. If they were *all* zero
-- then the argument to nonEmpty is [] so it returns Nothing.
-- If some were not zero then we end up with a NonEmpty list of
-- non-zeroes.
= \es ->
, prodF Prod <$> (traverse removeZero =<< sequence es)
-- ^ If any of the terms in a Prod is Nothing then the entire
-- product is Nothing (because the product of a list
-- containing zero is itself zero).
}
```

```
> fmap int (removeZero example)
Just 4
> fmap int (removeZero (zero .* One))
Nothing
```

We’re only ever going to use `removeZero`

at type

`removeZero :: Expr () -> Maybe (Expr Void)`

but the more general type signature

`removeZero :: Expr unit -> Maybe (Expr void)`

is valid, so let’s use that instead. After all, more polymorphism gives fewer places for bugs to hide, right?

`removeZero`

calculates the correct result but regrettably it has
exponential run time! How can this be? The point of expressing
`removeZero`

in terms of a fold function was to make it brainlessly
simple to write it efficiently and correctly.

The problem is that I called `removeZero`

in the body of its own
definition. I wasn’t supposed to do that! `exprFold`

was supposed to
encapsulate all the recursion so that I need not use explicit
recursion myself. For the `prodF`

field I wrote

`= \es -> Prod <$> (traverse removeZero =<< sequence es) prodF `

when I should have written

`= \es -> Prod <$> sequence es prodF `

The argument to `prodF`

, `es`

, has *already* had the recursive call of
`removeZero`

applied to each element by `exprFold`

. I didn’t need to
call `removeZero`

again. Calling it again doesn’t change the result
(because there are no longer any zeros) but it doubles the amount of
work at each `Prod`

node, making the run time exponential.

Ironically, if `removeZero`

were less polymorphic, taking an argument
of type `Expr ()`

instead of `Expr unit`

, then the type checker would
have caught this bug. The recursive call is at type `Expr void`

which
Haskell’s polymorphic recursion very happily allows to unify with
`Expr unit`

. Furthermore, if `removeZero`

were not idempotent then
this bug would have been quickly discovered by the test suite. The bug could only occur because the following three conditions coincided:

`removeZero`

is idempotent- I chose to generalise the type signature
- Haskell supports polymorphic recursion

This bug occurred in real life in
Opaleye, my
PostgreSQL
EDSL. The
bug was added in Feb
2016
and only removed nearly three and a half years
later,
after being
reported
by Christopher Sasarak. (Despite the
commit message the slowdown is exponential, not quadratic. Suppose
that each level of the tree has only one branch. We’re running
`removeEmpty`

twice for each branch, so `cost (Prod es) = 2 * sum (cost es)`

.)

Opaleye makes good use of Haskell’s type system to check correctness. It has a comprehensive set of unit tests and property tests. However, because the buggy function returns the correct result there is no way that Haskell’s type system can spot the error. Nor can unit tests, property tests, Liquid Haskell, or any other system which only checks the calculated value. Some form of performance testing is required. If you have a library without a performance testing suite perhaps you will be lucky enough, like me, that your users act as performance testers for you!