Making invalid laziness unrepresentable, even in nested data structures.
One circumstance in which thunk leaks occur in Haskell programs is when a lazy field of a data type remains unevaluated for longer than expected. By way of example consider the following data type:
data Foo = Foo Int Bool
During execution of a program what possible states can a value of
type Foo
be in? There are broadly five:
<thunk>
Foo <evaluated Int> <evaluated Bool>
Foo <evaluated Int> <thunk>
Foo <thunk> <evaluated Bool>
Foo <thunk> <thunk>
The <thunk>
s can be arbitrarily large run time data
structures! Their unexpected occurrence can cause thunk leaks.
What can we do about that? When programming in a strongly typed
language we aim to “make invalid states unrepresentable”1 so when
we define a data type like Foo
we should consider which are its
valid states. Haskell is a lazy language so we cannot forbid
state 12, but are states like 3, 4 and 5 valid? Perhaps. We should
carefully consider our use case; but if not (and it’s more likely
not than so) then we should forbid those states statically. How
can we do so? We can add strictness annotations thus:
data FooStrict = FooStrict !Int !Bool
(or by enabling StrictData
which brutally applies the same
transformation to every data type definition, or Strict
which
is even more brutal). FooStrict
has only two possible states:
<thunk>
Foo <evaluated Int> <evaluated Bool>
We have “made invalid laziness unrepresentable”. Much better!
But the above technique is not particularly general. Consider
data Bar = Bar !(Int, Bool) !(Maybe Double)
Despite the strictness annotations this type has many possible states:
<thunk>
Bar (<thunk>, <thunk>) Nothing
Bar (<evaulated Int>, <thunk>) Nothing
Bar (<thunk>, <evaluated Bool>) Nothing
Bar (<evaluated Int>, <evaluated Bool>) Nothing
Bar (<thunk>, <thunk>) (Just <thunk>)
Bar (<evaluated Int>, <thunk>) (Just <thunk>)
Bar (<thunk>, <evaluated Bool>) (Just <thunk>)
Bar (<evaluated Int>, <evaluated Bool>) (Just <thunk>)
Bar (<thunk>, <thunk>) (Just <evaluated Double>)
Bar (<evaluated Int>, <thunk>) (Just <evaluated Double>)
Bar (<thunk>, <evaluated Bool>) (Just <evaluated Double>)
Bar (<evaluated Int>, <evaluated Bool>) (Just <evaluated Double>)
Plenty of thunks for leaks to hide in! Perhaps for some use cases all the above states are valid but in most cases it is overwhelmingly likely that only the following states are valid:
<thunk>
(because we can’t do anything about it anyway)Bar (<evaluated Int>, <evaluated Bool>) Nothing
Bar (<evaluated Int>, <evaluated Bool>) (Just <evaluated Double>)
No clever application of strictness annotations can restrict us to this set of states! The problem is that there’s no way of “applying strictness inside” the nested data types. How can we make invalid laziness unrepresentable in nested data types?
Since we cannot “apply strictness inside” the nested data type we need
to use separate strict types for those fields of Bar
. The usual
approach is to write specific strict data definitions, for example
data StrictPair a b = StrictPair !a !b
data StrictMaybe a = StrictNothing | StrictJust !a
but I’m going to show you a more lightweight approach. The library
strict-wrapper
allows us to rewrite Bar
as
data BarStrict = BarStrict !(Strict (Int, Bool))
!(Strict (Maybe Double))
Having done so only the valid states are representable:
<thunk>
BarStrict (Strict (<evaluated Int>, <evaluated Bool>))
(Strict Nothing)
BarStrict (Strict (<evaluated Int>, <evaluated Bool>))
(Strict (Just <evaluated Double>))
Deeper nesting works too, for example:
data Baz = Baz !(Strict (Int, Strict (Either Bool Int)))
!(Strict (Maybe Double))
Strict
also works well as a wrapper for types whose values will
be passed as strict function arguments. For example, the
following function has a space leak:
= foldl' f (0, 0) [1..1000]
example_leak where f :: (Int, Int) -> Int -> (Int, Int)
= (n + 1, s + i) f (n, s) i
Manually adding strictness avoids the space leak:
= foldl' f (0, 0) [1..1000]
example_manual where f :: (Int, Int) -> Int -> (Int, Int)
!n, !s) i = (n + 1, s + i) f (
but using Strict
avoids the space leak in a way that is both
syntactically convenient and reflected in the type of the
accumulator function f
, that is to say, invalid laziness in the
accumulator argument has been made unrepresentable!
= Data.List.foldl' f (Strict (0, 0)) [1..1000]
example_Strict where f :: Strict (Int, Int) -> Int -> Strict (Int, Int)
Strict (n, s)) i = Strict (n + 1, s + i) f (
(Unfortunately we still have to remember to use foldl'
rather than
foldl
. Nothing in the type system can help us there.)
strict-wrapper
is implemented using a data family (Strict
) which maps basic
types to their strict versions.
data instance Strict (a, b) = StrictPair !a !b
data instance Strict (Maybe a) = StrictNothing | StrictJust !a
To use
strict-wrapper
all that you need is the data family Strict
and the bidirectional
pattern synonym, also called Strict
. For example, instead of using
StrictPair a b
as defined above, use Strict (a, b)
. To create a
Strict (a, b)
wrap an (a, b)
in the Strict
constructor; to
extract an (a, b)
, pattern match with Strict
.
There are various alternatives to
strict-wrapper
worth considering although in most cases they are too heavyweight or
don’t fulfil the same role.
seq
/bang patternsIt is always possible to use seq
(or equivalently bang patterns) to
ensure that invalid thunk states don’t arise. After all, strictness
annotations and strict data types are implemented merely by automatic
insertion of seq
! However, in pratice it is extremely difficult to
maintain the level of discipline required to make sure that the seq
calls or bang patterns are inserted in the correct places (and not in
the incorrect places). The benefit of programming in a strongly typed
functional language is that we can make invalid states
unrepresentable. That principle applies as much to invalid data type
laziness as to other invalid states.
strict
is a library
that provides a grab-bag of features related to strictness: strict
versions of basic types, strict I/O, and a class to map between strict
and lazy types (including ByteString
and Text
types and monad
transformer types).
By contrast, strict-wrapper
is a much smaller and more coherent
subset of the features of strict
: it only provides strict versions
of basic types and a class to map between them. In return for being
more restrictive the mapping can be made almost zero-cost (see
“Efficiency
considerations”).
Furthermore the Strict
data family avoids the need for a whole
universe of new strict type names and the Strict
pattern/constructor
is more ergonomic than toStrict
/toLazy
mapping functions.
deepseq
is an extremely expensive and blunt hammer. It has to walk
your entire data structure evaluating any thunks it encounters. Were
those thunks actually part of a valid state of your program? In many
(perhaps most) cases they were not! In those cases would it not be
better to design those thunks out of your data structures and avoid
deepseq entirely?
nothunks is a debugging tool that allows inspecting a value at run time to see if it contains any thunks. That is, it can check at run time whether a value is invalid. But if you can make the invalid laziness unrepresentable then why not do so?
Thunk leaks are a major cause of unpredictable memory usage in Haskell
programs; a common cause of thunk leaks is lazy fields in data
structures. There are some cases where lazy fields are required, but
in the cases that they are not required consider “making invalid
laziness unrepresentable” using strictness annotations and a strict
data types library like
strict
or
strict-wrapper
.
This practice might eliminate a substantial proportion of your thunk
leaks!
Although strict-wrapper
provides a convenient strict version of
“small” types like tuples, eithers and maybes, it is not clear what
can be done for “large” data structures like lists and maps.
Converting the lazy form of a large data structure into the strict
form requires walking the whole structure, so one cannot do better
than a deepseq
. If one wants a spine strict alternative to a list
perhaps it is best just to directly use a
Seq
,
Vector
or
Array
,
and if one wants a spine and value strict alternative then perhaps
Vector.Unboxed
or
UArray
are the right choices.
I believe that Yaron Minsky coined the phrase “make illegal states unrepresentable”, in The Monad.Reader, Issue 7 (April 30, 2007). I mildly prefer talking about “valid” states than “legal” ones.↩︎
barring unlifted data types, which are out of scope for this discussion and this library.↩︎