# Monads Demystified

With this entry, I am going where many bloggers have gone before. Writing a monad tutorial that makes sense to you (as the writer) but nobody else seems to be a rite of passage. Silly monad tutorial are so overdone that there’s even a running joke about how Monads Are Like Burritos.

So despite the fact that Monads are not a good choice as topic for your first Haskell blog entry, I’m going to be a little audacious here and make Monads the topic of my first Haskell blog entry. My previous “X Demystified” entries (React Demystified, LL and LR Parsing Demystified) have been well-received, which is allowing me to believe that I am good at explaining things. I have to apologize to Haskell people, as I’ll describe some things by way of analogy with C++ and use some C++-like notation, since it’s how I think. Did I mention I’ve written about 10 lines of Haskell in my life total? Which is fewer than the number of monad tutorials and articles I’ve studied, which is a lot. Anyway, here goes.

# Monads are a Design Pattern

Our first task is to understand what a monad even is. Is a monad a value? Is it a type? Is it a language feature? Is it an I/O library? Most explanations either throw a mathematical construction at you or start showing you examples of monads. But neither of these is helps you classify what a monad even is.

The answer is actually pretty simple. Monads are a *design pattern*.
In fact, it would probably be more clear to talk about monads in terms
of “the Monad pattern” in the same way you’d talk about design
patterns like Factory or Visitor. Any type you create that follows
the monadic pattern and obeys a few key rules around that pattern is a
monad.

It’s really important to understand that the *pattern* is what makes a
monad a monad. This pattern can be used for a very wide range of
things. It can be used to model side-effects and I/O in Haskell, but
many monads have nothing to do with side-effects or I/O. This is one
of the things that makes monads confusing to learn about. Whenever
you are tempted to take something you learned about a monad and
generalize it, you can often find a counterexample of a monad for
which that thing is not true. There is an entire web
page dedicated to listing a bunch of incorrect generalizations
about monads. That page explains at length what monads are *not*. I
am going to try and explain what monads *are*.

The best analogy I can think of is operator overloading in C++. People overload C++ operators to do everything from matrix multiplication to string concatenation to constructing a parser. Saying “monads are about side effects” would be like saying “operator overloading is about matrix multiplication,” which isn’t true in general: lots of overloaded operators have nothing to do with matrix multiplication.

Monads, like overloaded operators, also have some identities which you
should follow. With overloaded operators you should make sure that ```
x
= x + y
```

has the same effect as `x += y`

, regardless of what the `+`

operator actually does. The monad pattern has a similar set of rules
that need to be followed for things to make sense and compose
correctly.

# The Monad Abstraction: bind (“and then”)

Let’s continue the analogy with overloading `+=`

in C++. In C++, ```
x
+= y
```

means “add y to x”, whatever “add” means in your context
(numerical addition, matrix addition, string concatenation, etc). So
what is the basic semantic of a monad?

The basic abstraction of a monad is “bind”, written `x >>= f`

. Bind
means, loosely, “and then f”. Basically a monad lets you keep adding
functions to a chain of operations. But the monad gets to decide what
“and then f” means for that monad. For this reason, monads are often
called “programmable semicolons.” As a monad author, you get to put
your own custom logic between each step of the sequence.

(Note for the pedantic: some people would interrupt me here and argue that Monads are not about ordering/sequencing. I would respectfully disagree. Just because a monad can be implemented to be commutative doesn’t take away from the fundamentally sequence-oriented nature of the abstraction. Later functions in the sequence have access to more value bindings than earlier functions. So from a pure data dependency perspective, a sequence is established in the abstraction, even if the functions do not literally need to be evaluated in sequence.)

# How the Monad pattern works

We know now that monad is a pattern, and that the primary abstraction is bind (“and then”). Now we will look at how the pattern actually works.

The pattern is a bit tricky to wrap your head around, but I find it easiest to look at visually:

Starting from the top, we initially have a “monadic value.” A monadic
value is an instance of the monad’s type, so it basically gets to
contain whatever state the monad author wanted to put in it. I didn’t
mention how you get the first monadic value, because that can vary
based on the monad and how you’re using it. Right now I’m focusing on
characteristics that apply to *all* monads.

The monad’s user calls `bind()`

and passes two arguments: the monadic
value and a function that will return a monadic value. Note that
`bind()`

is a function that was defined by the monad’s author and is
specific to that monad. Indeed, `bind()`

is the most interesting part
of the monad! This is where the monad’s author gets to decide what
“and then” means for this monad. In particular, note that `bind()`

gets to decide how to call the passed-in function. It can call it
once, many times, or not at all! All of these options are valid and
used by at least one real monad implementation.

`bind()`

needs to return a monadic value. And notice that the
passed-in function also returns a monadic value! So one very common
option is for `bind()`

to evaluate the passed-in function and then use
its return value as the return value for `bind()`

. If `bind()`

doesn’t evaluate the function at all, it will need to make something
up for the returned monadic value. If it evaluates the function many
times, it may merge their returned values into the output monadic
value. There are many options for how to implement `bind()`

.

There’s one other important thing I haven’t mentioned yet. The
passed-in function takes an argument. The `bind()`

function gets to
decide where this argument comes from and how to obtain it. This is
again why a monad is called a “programmable semicolon.” The monad
gets to make a bunch of really interesting decisions at every stage
about how to evaluate each function of the sequence, and where its
input data comes from. *And* it gets to carry along whatever data it
wants to inside the monadic value. The monad can use this to
propagate whatever extra state is desired.

The type of each function’s argument is a type parameter of the previous monadic value. If we annotate the previous diagram with type information, it will look more like this:

# The most important monad: IO

Having established a lot of *general* information about monads, it
seems about the right time to dive into a specific monad. And of all
monads in Haskell, `IO`

is the most important, since it is used to
model *all* side effects. You come into contact with `IO`

immediately
when you write your first Haskell program, because Haskell’s `main`

function is defined as returning the type `IO ()`

.

When we want to understand a new monad, there are two key questions we want to ask ourselves:

- what does a monadic value represent in this monad?
- what semantic does
`bind()`

(“and then”) implement in this monad?

For the `IO`

monad, a monadic value `IO a`

(or `IO<a>`

in C++ notation
– `a`

is a type parameter) represents a side-effectful action
yielding a value of type `a`

, but which has not been performed yet.
And `bind()`

implements the semantic “add this function to the action
chain. If/when the function is evaluated, it will know that the
operation has been performed and will receive the result (if any) of
the operation. The function gets to return an `IO`

value deciding
what action should be performed next.”

So in Haskell, functions which *look* like they perform I/O, like
`putStrLn`

, don’t actually perform I/O but rather return a value
representing the not-yet-performed operation.

The I/O operation is not actually performed unless Haskell determines
that the monadic `IO`

value is required for something (remember that
Haskell is lazy). The easiest way to do this is to return it from
your `main`

function, since Haskell will by definition perform any I/O
returned from `main`

. So the `main`

function’s type `IO ()`

means “a
possibly side-effectful action that doesn’t return/yield anything.”
And indeed your `main`

function should perform some kind of
side-effect, otherwise the program is effectively a no-op and should
be optimized to nothing! Returning a useful `IO ()`

action from
`main`

is how you make your program do something useful.

That is why this program prints “Hello, world!” (since the `IO`

action
is returned from `main`

):

But this program does nothing (because we don’t return the `IO`

action
from `main`

):

The next example also does nothing, because even though we bind the
`IO`

value to an action, the monadic value returned from `bind()`

operation is never required, because we don’t return it from `main`

:

What does it look like to have a two stage operation, a read followed
by a write? Let’s use `getLine`

which has this type:

We’ll need to use `bind()`

to sequence two actions together, and to
get the value of the first operation for use in the second. Our
two-stage pipeline looks like this:

This example clearly illustrates using `bind()`

(ie. `>>=`

) to
sequence two `IO`

actions together! This is the fundamental idea of
the `IO`

monad. Monadic values represent unperformed I/O operations.
And `bind()`

will add a function to the chain, such that it will get
the result of the previous I/O operation and select the next operation
to be performed. But no operation in the chain will be performed
unless it is ultimately part of the I/O action returned by `main`

.

# Chaining multiple operations / “do” notation

Let’s take our previous example one step farther and have three `IO`

operations in a row, where the third one wants to use the values from
both of the first two:

This example fails to compile, because `bind()`

only gives a function
direct access to the result of the directly preceding I/O operation.
However, this limitation is easy to work around by *nesting* the
functions we’re binding:

This is a little bit mind-bending, but it works because the function
on the right of a `bind()`

call has the responsibility to return a
monadic value, which `bind()`

itself does. So with this pattern,
we’re using the pattern in a right-associative way instead of
left-associative. In other words, if you’ll allow me to briefly
regress into Python-like function notation, instead of saying:

We’re now saying:

The major advantage of the second form is that later pipeline stages
have access to *all* previous bindings (ie. both `x`

and `y`

), not
just the directly preceding one.

As a result, the second form is, functionally speaking, pretty much strictly superior to the first. However it has a drawback from a convenience perspective, which is that each stage creates an extra level of function nesting. If you indent this properly, functions with many stages will start flowing off the right edge of your screen. Because this pattern is so popular, Haskell includes a convenience notation for this function nesting:

Just like our manually written-out version, both the `x`

and `y`

bindings are available in later stages of the sequence.

To properly represent this new variation, we should update our previous diagram. The diagram gets a little more crazy with all the nesting, but the key point to notice is that since the second function is nested inside the first, it has access to the first function’s argument / variable binding.

# The Maybe monad and “return”

Let’s look at a different monad to diversify our understanding a bit.
`Maybe`

is probably the simplest monad that is actually useful, but is
also different from `IO`

in some important ways. Maybe doesn’t have
anything to do with side-effects: it is completely pure.

I said before that the two important questions to ask when
encountering a new monad are: (1) what does a monadic value represent,
and (2) what semantic does `bind()`

implement?

For `Maybe`

the answers to these questions are extremely simple. A
monadic value `Maybe a`

represents either a value of type `a`

or
`Nothing`

, the lack of any value:

And `bind()`

implements the semantic: “if there is a value, evaluate
the function (with that value), otherwise don’t evaluate the function
at all.”

So when you chain a bunch of functions together with the `Maybe`

monad, it will evaluate all of them in sequence until one of them
returns `Nothing`

. If any function in the sequence returns `Nothing`

then the subsequent functions won’t be evaluated at all and the chain
will return `Nothing`

.

But how do you get the whole thing started? With the `IO`

monad, we
were obtaining `IO`

values from I/O library functions like `getLine`

.
In the case of Maybe though, we want to provide our own value to start
the process. For this we could construct such a value by using the
`Just`

constructor directly:

But as a general rule, monads are required to provide a standard
function called `return`

that does just this. The job of `return`

(in
any monad) is to construct a monadic value out of a plain value. The
implementation details of a monadic value are not public, but the
`return`

function is a public interface for building a trivial monadic
value that simply wraps the given value.

It’s a little weird that the function is called `return`

when you can
use it to start a monadic pipeline. But it makes more sense when you
realize that the function is also useful for returning a monadic value
from a function, for example:

The monadic
laws (which I won’t get into in this article) guarantee that
`return`

doesn’t do anything too fancy, but just wraps the value and
yields it unchanged to the next function in the sequence.

# Conclusion

There are many more monads than these two to explore, but this article
has gotten quite long already. I hope you have found this article
useful in your understanding of monads. If you can take one thing
away from this article, remember the two questions to ask yourself
when encountering a new monad: (1) what does a monadic value
represent, and (2) what semantic does `bind()`

implement?