I accidentally introduced exponential run time into a compiler I was writing and did not realise for over three years.
Once upon a time I was writing an AST to represent expressions in a programming language. In reality the language was SQL, but for the sake accessibility let’s pretend it was a simple arithmetic language that allows us to write zero, one, and (non-empty) sums and products.
data Expr a = Zero a
| One
| Sum (NonEmpty (Expr a))
| Prod (NonEmpty (Expr a))
The reason why Zero
has an argument will be explained in due course.
First, let’s implement wrapper functions to provide a nice API and
write an example expression.
zero :: Expr ()
= Zero ()
zero
(.+) :: Expr a -> Expr a -> Expr a
.+ e2 = Sum (pure e1 <> pure e2)
e1
(.*) :: Expr a -> Expr a -> Expr a
.* e2 = Prod (pure e1 <> pure e2)
e1
example :: Expr ()
= (One .+ zero .+ zero .+ One) .* (zero .+ One .+ One) example
We expect the example expression to evaluate to 2 * 2
, i.e. 4
.
Let’s write an evaluator and check. The evaluator will be recursive
on the structure of terms of type Expr
. We are good functional
programmers so we will write a data type and general fold function to
encapsulate the recursion pattern. They basically write themselves,
given the definition of Expr
.
data ExprFold a r = ExprFold
zeroF :: a -> r
{ oneF :: r
, sumF :: NonEmpty r -> r
, prodF :: NonEmpty r -> r
,
}
exprFold :: ExprFold a r -> Expr a -> r
= \case
exprFold f Zero a -> zeroF f a
One -> oneF f
Prod es -> prodF f (fmap (exprFold f) es)
Sum es -> sumF f (fmap (exprFold f) es)
Now it’s simple, almost tautological, to write an evaluator to interpret terms in the integers.
int :: Expr a -> Int
= exprFold ExprFold
int = const 0
{ zeroF = 1
, oneF = sum
, sumF = product
, prodF }
> int example
4
Now I’ll explain the reason for the argument to the Zero
constructor. The backend that we are targeting doesn’t have a very
good concept of Zero
! We can happily send One
to it but sending
Zero
is quite tricky. In reality the backend is SQL and it’s not
terribly straightforward to generically represent an empty query in
SQL.
Instead of dealing with that problem directly, my approach was to give
Expr
a type parameter. Then expressions of type Expr ()
can
contain Zero
and expressions of type Expr Void
cannot.
Expressions that might contain Zero
need to have it removed before
the query is sent to the backend. The following function removes all
occurrences of Zero
from an expression. If the expression itself
was actually zero it returns Nothing
, otherwise it returns a Just
.
Because the type parameter in the return type is polymorphic we have a
static guarantee that the result doesn’t contain any Zero
s!
-- Just an expression, or Nothing. Nothing means that the value
-- of the expression is zero.
removeZero :: Expr unit -> Maybe (Expr void)
= exprFold ExprFold
removeZero = const Nothing
{ zeroF -- ^ Zero is converted to Nothing
= pure One
, oneF -- ^ One does not need to change
= \es ->
, sumF Sum <$> nonEmpty (catMaybes (toList es))
-- ^ We filter all zeros out of the Sum. Terms that are zero
-- (Nothing) are removed by catMaybes. If they were *all* zero
-- then the argument to nonEmpty is [] so it returns Nothing.
-- If some were not zero then we end up with a NonEmpty list of
-- non-zeroes.
= \es ->
, prodF Prod <$> (traverse removeZero =<< sequence es)
-- ^ If any of the terms in a Prod is Nothing then the entire
-- product is Nothing (because the product of a list
-- containing zero is itself zero).
}
> fmap int (removeZero example)
Just 4
> fmap int (removeZero (zero .* One))
Nothing
We’re only ever going to use removeZero
at type
removeZero :: Expr () -> Maybe (Expr Void)
but the more general type signature
removeZero :: Expr unit -> Maybe (Expr void)
is valid, so let’s use that instead. After all, more polymorphism gives fewer places for bugs to hide, right?
removeZero
calculates the correct result but regrettably it has
exponential run time! How can this be? The point of expressing
removeZero
in terms of a fold function was to make it brainlessly
simple to write it efficiently and correctly.
The problem is that I called removeZero
in the body of its own
definition. I wasn’t supposed to do that! exprFold
was supposed to
encapsulate all the recursion so that I need not use explicit
recursion myself. For the prodF
field I wrote
= \es -> Prod <$> (traverse removeZero =<< sequence es) prodF
when I should have written
= \es -> Prod <$> sequence es prodF
The argument to prodF
, es
, has already had the recursive call of
removeZero
applied to each element by exprFold
. I didn’t need to
call removeZero
again. Calling it again doesn’t change the result
(because there are no longer any zeros) but it doubles the amount of
work at each Prod
node, making the run time exponential.
Ironically, if removeZero
were less polymorphic, taking an argument
of type Expr ()
instead of Expr unit
, then the type checker would
have caught this bug. The recursive call is at type Expr void
which
Haskell’s polymorphic recursion very happily allows to unify with
Expr unit
. Furthermore, if removeZero
were not idempotent then
this bug would have been quickly discovered by the test suite. The bug could only occur because the following three conditions coincided:
removeZero
is idempotentThis bug occurred in real life in
Opaleye, my
PostgreSQL
EDSL. The
bug was added in Feb
2016
and only removed nearly three and a half years
later,
after being
reported
by Christopher Sasarak. (Despite the
commit message the slowdown is exponential, not quadratic. Suppose
that each level of the tree has only one branch. We’re running
removeEmpty
twice for each branch, so cost (Prod es) = 2 * sum (cost es)
.)
Opaleye makes good use of Haskell’s type system to check correctness. It has a comprehensive set of unit tests and property tests. However, because the buggy function returns the correct result there is no way that Haskell’s type system can spot the error. Nor can unit tests, property tests, Liquid Haskell, or any other system which only checks the calculated value. Some form of performance testing is required. If you have a library without a performance testing suite perhaps you will be lucky enough, like me, that your users act as performance testers for you!