In denotational semantics and functional programming, the terms monad morphism, monad layering, monad constructor, and monad transformer have by now accumulated 20 years of twisted history. The exchange between Eric Kidd and sigfpe about the probability monad prompted me to investigate this history.
Chapter 4 of Eugenio Moggi’s “An
Abstract View of Programming Languages” (Edinburgh LFCS
technical report ECSLFCS90113, 1990) defines a monad
morphism as you’d expect in the category of monads: a natural
transformation between two monads as functors that commutes with
the two monads’ unit (return
) and bind
(>>=
) operations. The two monads don’t even need
to be over the same category of types, but Haskell has only one
category of types (pace type
classes), so Moggi focuses on the simplified case,
where the functor U is an identity. Further, Moggi cares
only about strong monads, which Haskell programmers like
Wadler call a monad for short. Roughly, a strong monad respects the
Cartesian structure (in particular of environments) of the category
of types. Moggi thus defines a simplified strong monad
morphism, which relates two strong monads over the same
category of types. For short, Haskell programmers (like me
henceforth) call it a monad morphism, of type forall a. m a
> n a
where m
and n
are the
two (strong) monads being related.
Moggi continues with “some examples of monad
constructions” (emphasis mine). He defines “several
endofunctors on Obj(Mon(C))”, where Obj(Mon(C)) is a category with
monads for objects but only trivial (identity) morphisms. In other
words, he defines several maps from monads to monads, including
what Haskell calls ErrorT
, StateT
,
ReaderT
, WriterT
, and
ContT
.
Endofunctors on Obj(Mon(C)) have very little structure. One of
them exchanges the Identity
monad with the
State
Int
monad while leaving all other
monads alone. You can’t even define this exchange as a Haskell type
constructor of kind (*>*)>(*>*)
. Moggi
thus considers several ways to add structure to a given endofunctor
F on Obj(Mon(C)).

“Can F be extended to an endofunctor F over Mon(C)?” That is, does every monad morphism M → N yield a monad morphism FM → FN such that blah commutes? For the Haskell monad maps listed above, the answer is yes except for
ContT
. 
“Is there a natural transformation from Id_{Mon(C)} to F?” That is, does every monad M yield a monad morphism M → FM such that blah commutes? For the Haskell monad maps listed above, the answer is all yes.

Can we lift the operations (such as changing the state or producing output) of each monad M into FM? That is, as F adds new operations to a monad, can it preserve the old operations of the monad?
The last question is crucial for Moggi’s goal, to make
denotational semantics modular. Steele, Espinosa,
and Liang, Hudak,
and Jones turn Moggi’s goal into a puzzle in functional
programming, that of building a modular interpreter. Liang et al.
define a monad transformer as a monadtomonad map that
satisfies item 2 above, in other words, a natural transformation
from Id_{Mon(C)} to an endofunctor on Obj(Mon(C)). The
monad morphism M → FM in their definition is
known as lift
in Haskell’s MonadTrans
class.
Of items 1 and 3 above, the latter has received by far more attention in the literature. After all, monads are useful only to the extent that they provide useful operations. Filinski defines a layering of a monad N over another monad M as a function family M(NA) → NA (for each type A) such that blah commutes. In Haskell notation, a layering defines a polymorphic function
layer :: forall a. m (n a) > n a
where m
and n
are the two monads.
Filinski explains that monad morphisms and monad layerings are
equivalent:
lift = layer . liftM return
layer = join . lift
Filinski then encodes layered monads in terms of delimited control. Lüth and Ghani propose another approach to combining monads and their operations using coproducts.
So, that’s what I know of the story of monad morphisms, layerings, constructors, and transformers in the literature.
Three asides. First, Cartwright and Felleisen pursue modular denotational semantics with side effects but not (directly) monads. Second, monads generalize to endofunctorenriched categories or parameterized monads, a more general notion of computation.
Finally, although the probability monad factors into the list
monad and the writer monad transformer (the latter for the monoid
of probabilities with multiplication as the binary operation), we
need to distinguish the list monad from the set monad: the
distribution “True 0.2, True 0.2, False 0.6” is equivalent not to
“True 0.2, False 0.6” but to “True 0.4, False 0.6”. This
equivalence arises only from the probability monad as a whole. As
long as we only
apply the monad to types with equality, we can reduce
distributions using such equivalences to compute more efficiently.
Then the monad generalizes from probabilities to any semiring (with a
multiplicative identity), while remaining an instance of
MonadWriter
and (assuming the semiring also has an
additive identity) MonadPlus
. This generalization is
used in speech
recognition and other
applications in naturallanguage processing.