I will explain how, if you start with an impure lazy programming
language, you very quickly discover that you may as well make it pure
and handle side effects through an IO
type.
I’ve got a strongly-typed, impure, lazy, programming language called “Unclean”. Impure means that let binding and function application can have side effects. Lazy means that, firstly, let bindings and function arguments are not evaluated until they are demanded and, secondly, the results of computations bound to an identifier are memoized.
The first part of laziness is that function arguments are not evaluated
until and unless they are used. In the following example the value of
x
is never demanded, so it is never calculated and the expression is
quickly evaluated to "Hello"
.
> let x = sum [1..10000000] in "Hello"
"Hello"
The second part of laziness is that the result of an expression bound
to an identifier is memoized (cached) for reuse. In the following
example the first time we evaluate y
the computation bound to it is
run and there is some delay before we see the result. The second time
we evaluate y
the result is returned instantly because it has
already been memoized.
> let y = sum [1..10000000]
> y
50000005000000
-- That was slow
> y
50000005000000
-- That was quick
Here are a couple of entries in the Unclean standard library:
putStrLn :: String -> ()
getLine :: String
And here are examples of using them:
> putStrLn "Hello world"
Hello world
()
putStrLn
prints its argument to the console and returns ()
. I can
also print the result of reading a line of input:
> putStrLn getLine
I entered this text
I entered this text
()
There’s also a function seq :: a -> b -> b
which evaluates its first
argument and then returns its second. I can use seq
to write
sequences of code that interact with the user.
main2 :: ()
= putStrLn "What is your name?"
main2 `seq`
putStrLn ("Hello " ++ getLine)
> main2
What is your name?
Hello Tom
Tom
()
Hmm, it doesn’t quite do what I want though. Due to laziness it started printing “Hello” before it asked for my name. How about the following instead?
main3 :: ()
= let g = getLine
main3 in putStrLn "What is your name?"
`seq`
g`seq`
putStrLn ("Hello " ++ g)
> main3
What is your name?
Tom
Hello Tom
()
That’s much better. But what if I want to read two strings with
getLine
? I can try
main4 :: ()
= let name = getLine
main4 = getLine
language in putStrLn "What is your name?"
`seq`
name`seq`
putStrLn "What is your favourite programming language?"
`seq`
language`seq`
putStrLn (name ++ " likes " ++ language)
> main4
What is your name?
Tom
What is your favourite programming language?
Tom likes Tom
()
That’s strange. It only asked for my name and the told me that I like
myself. Whilst technically true this is not what I wanted. What’s
going on? Unclean memoizes getLine
and and so we see the same
result in all its uses. That’s no good at all! It will mean that we
can only ever read one line in an Unclean program. To avoid
memoization we can make getLine
a function.
getLineF :: () -> String
We apply a ()
argument to getLineF
each time it is used and thus
we don’t memoize.
main5 :: ()
main5 = putStrLn "What is your name?"
`seq`
let name = getLineF ()
in name
`seq`
putStrLn "What is your favourite programming language?"
`seq`
let language = getLineF ()
in language
`seq`
putStrLn (name ++ " likes " ++ language)
> main5
What is your name?
Tom
What is your favourite programming language?
Unclean
Tom likes Unclean
()
That’s the result we wanted. It’s a bit annoying that we have to
mention our variables twice when we bind them. let name = getLineF () in name ...
is awkward and sure to lead to programming mistakes.
How far can we abstract that away? The first thing to try is this
andThen
function.
andThen :: b -> (b -> c) -> c
= x `seq` f x andThen x f
It allows us to write
main6 :: ()
= putStrLn "What is your name?"
main6 `andThen` \_ ->
`andThen` \name ->
getLineF () putStrLn "What is your favourite programming language?"
`andThen` \_ ->
`andThen` \language ->
getLineF () putStrLn (name ++ " likes " ++ language)
which runs just like main5
. andThen
works to essentially bind the
result of a function to a name. It’s still syntactically awkward
though. Let’s invent a “do
notation” to make it syntactically more
convenient.
main7 :: ()
= do
main7 putStrLn "What is your name?"
<- getLineF ()
name putStrLn "What is your favourite programming language?"
<- getLineF ()
language putStrLn (name ++ " likes " ++ language)
where (>>=) = andThen
>>) x y = x >>= (\_ -> y)
(return = id
fail = error
main7
also runs just like main5
. This do
notation has nothing
to do with monads. It’s just a syntactic transformation that makes
working with andThen
more convenient.
Unfortunately it still doesn’t stop us doing weird things like this:
main8 :: ()
= do
main8 let g = getLineF ()
putStrLn "What is your name?"
<- g
name putStrLn "What is your favourite programming language?"
<- g
language putStrLn (name ++ " likes " ++ language)
where (>>=) = andThen
>>) x y = x >>= (\_ -> y)
(return = id
fail = error
> main8
What is your name?
Tom
What is your favourite programming language?
Tom likes Tom
()
There’s still too much memoization. Ideally we don’t want to memoize the results of side effects at all. Let’s wrap up our side-effecting actions in a datatype which always prevents memoization.
data IO a = IO (() -> a)
unIO :: IO a -> a
IO a) = a ()
unIO (
andThenIO :: IO a -> (a -> IO b) -> IO b
= IO (\_ -> unIO x `andThen` (unIO . f))
andThenIO x f
putStrLnIO :: String -> IO ()
= IO (\_ -> putStrLn s)
putStrLnIO s
getLineIO :: IO String
= IO getLineF getLineIO
Then nothing goes wrong if we try to get a line in two places
main9 :: ()
= unIO $ do
main9 let g = getLineIO
"What is your name?"
putStrLnIO <- g
name "What is your favourite programming language?"
putStrLnIO <- g
language ++ " likes " ++ language)
putStrLnIO (name
where (>>=) = andThenIO
>>) x y = x >>= (\_ -> y)
(return x = IO (\_ -> x)
fail = error
main9
also runs just like main5
. This setup is still not quite
right though. It doesn’t prevent us from writing
main10 :: ()
= unIO $ do
main10 let g = getLine
"What is your name?"
putStrLnIO let name = g
"What is your favourite programming language?"
putStrLnIO let language = g
++ " likes " ++ language)
putStrLnIO (name
where (>>=) = andThenIO
>>) x y = x >>= (\_ -> y)
(return x = IO (\_ -> x)
fail = error
which has the same problem as main8
. We are now, however, in a
position to solve all these problems once and for all.
The solution is to wrap all side-effecting primitives in IO
, and
ensure that IO
is an abstract datatype whose contents we can’t
unwrap. This way we can only combine side-effecting actions by using
andThenIO
, never directly. This ensures that side effects always
occur in the order we expect and function application itself never has
side effects. At this point Unclean has become a pure functional
programming language and it is, in fact, Haskell.
If you have an impure, lazy, functional programming language the only
natural way to handle side effects is to ensure that they are all
wrapped in a datatype (called IO
in Haskell, but it could be called
anything you like) and provide combinators for ordering them relative
to each other.
At this point the notion of “monad” is floating around in the
background, but that notion neither is needed to discover how to
handle side effects sanely in a lazy language nor is needed to
understand how to use the IO
type in practice. If the inventors
of Haskell had started with an impure lazy language (with seq
) and
asked themselves “how do we sanely handle the ordering of effects in
this language?” they would probably have quickly discovered this very
natural approach.
(Thanks to Sam Davis for a review)
You too can have your very own Unclean compiler. Just add some lines like this at the top of your source files:
{-# LANGUAGE RebindableSyntax #-}
import System.IO.Unsafe
import qualified Prelude
= unsafePerformIO
u
putStrLn :: String -> ()
putStrLn = u . Prelude.putStrLn
getLine :: String
getLine = u Prelude.getLine
getLineF :: () -> String
= u Prelude.getLine getLineF _