Monad transformers

標籤 Tags: english

2020-05-14 14:47

In denotational semantics and functional programming, the terms monad morphism, monad layering, monad constructor, and monad transformer have by now accumulated 20 years of twisted history. The exchange between Eric Kidd and sigfpe about the probability monad prompted me to investigate this history.

Chapter 4 of Eugenio Moggi’s ”An Abstract View of Programming Languages” (Edinburgh LFCS technical report ECS-LFCS-90-113, 1990) defines a monad morphism as you’d expect in the category of monads: a natural transformation between two monads as functors that commutes with the two monads’ unit (return) and bind (>>=) operations. The two monads don’t even need to be over the same category of types, but Haskell has only one category of types (pace type classes), so Moggi focuses on the simplified case, where the functor U is an identity. Further, Moggi cares only about strong monads, which Haskell programmers like Wadler call a monad for short. Roughly, a strong monad respects the Cartesian structure (in particular of environments) of the category of types. Moggi thus defines a simplified strong monad morphism, which relates two strong monads over the same category of types. For short, Haskell programmers (like me henceforth) call it a monad morphism, of type forall a. m a -> n a where m and n are the two (strong) monads being related.

Moggi continues with “some examples of monad constructions” (emphasis mine). He defines “several endofunctors on Obj(Mon(C))”, where Obj(Mon(C)) is a category with monads for objects but only trivial (identity) morphisms. In other words, he defines several maps from monads to monads, including what Haskell calls ErrorT, StateT, ReaderT, WriterT, and ContT.

Endofunctors on Obj(Mon(C)) have very little structure. One of them exchanges the Identity monad with the State Int monad while leaving all other monads alone. You can’t even define this exchange as a Haskell type constructor of kind (*->*)->(*->*). Moggi thus considers several ways to add structure to a given endofunctor F on Obj(Mon(C)).

”Can F be extended to an endofunctor F over Mon(C)?” That is, does every monad morphism M → N yield a monad morphism FM → FN such that blah commutes? For the Haskell monad maps listed above, the answer is yes except for ContT.
”Is there a natural transformation from Id_Mon(C) to F?” That is, does every monad M yield a monad morphism M → FM such that blah commutes? For the Haskell monad maps listed above, the answer is all yes.
Can we lift the operations (such as changing the state or producing output) of each monad M into FM? That is, as F adds new operations to a monad, can it preserve the old operations of the monad?

The last question is crucial for Moggi’s goal, to make denotational semantics modular. Steele, Espinosa, and Liang, Hudak, and Jones turn Moggi’s goal into a puzzle in functional programming, that of building a modular interpreter. Liang et al. define a monad transformer as a monad-to-monad map that satisfies item 2 above, in other words, a natural transformation from Id_Mon(C) to an endofunctor on Obj(Mon(C)). The monad morphism M → FM in their definition is known as lift in Haskell’s MonadTrans class.

Of items 1 and 3 above, the latter has received by far more attention in the literature. After all, monads are useful only to the extent that they provide useful operations. Filinski defines a layering of a monad N over another monad M as a function family M(NA) → NA (for each type A) such that blah commutes. In Haskell notation, a layering defines a polymorphic function

layer :: forall a. m (n a) -> n a

where m and n are the two monads. Filinski explains that monad morphisms and monad layerings are equivalent:

lift = layer . liftM return
layer = join . lift

Filinski then encodes layered monads in terms of delimited control. Lüth and Ghani propose another approach to combining monads and their operations using coproducts.

So, that’s what I know of the story of monad morphisms, layerings, constructors, and transformers in the literature.

Three asides. First, Cartwright and Felleisen pursue modular denotational semantics with side effects but not (directly) monads. Second, monads generalize to endofunctor-enriched categories or parameterized monads, a more general notion of computation.

Finally, although the probability monad factors into the list monad and the writer monad transformer (the latter for the monoid of probabilities with multiplication as the binary operation), we need to distinguish the list monad from the set monad: the distribution “True 0.2, True 0.2, False 0.6” is equivalent not to “True 0.2, False 0.6” but to “True 0.4, False 0.6”. This equivalence arises only from the probability monad as a whole. As long as we only apply the monad to types with equality, we can reduce distributions using such equivalences to compute more efficiently. Then the monad generalizes from probabilities to any semiring (with a multiplicative identity), while remaining an instance of MonadWriter and (assuming the semiring also has an additive identity) MonadPlus. This generalization is used in speech recognition and other applications in natural-language processing.