As the creator of this puzzle at ITA Software, I’m baffled at this exposition. Not the content, which I followed with bemusement, but the motivation.

Hi! Thanks for your talk, by the way. Air travel is such a concrete problem that connects and justifies so many disciplines, someone ought to write a popular-science book about it.

1.

Haskell and its cousins have not been widely adopted despite their good points, and I suspect expositions like this are a big part of their image problem. It’s hard to convince the masses to take functional languages seriously when their advocates manage to hide widely-understood concepts like ‘addition’ and ‘multiplication’ and dynamic programming behind generating functions, to no obvious benefit [1]. Haskell programs written by non-advocates look like they would in every other language. Haskell programs written by its promoters tend to look like something out of Russell&Whitehead - they’re little more “literate” than hexdump output is.

I don’t think of myself as a Haskell advocate, and advocating Haskell is certainly not our goal in writing these posts. Rather, we wanted to use the puzzle to introduce some mathematical concepts. Believe it or not, this way to think about the puzzle is the easiest for us that we’ve seen. But it really shouldn’t surprise you that different people prefer different ways to think about the same problem: one person’s dynamic programming is another’s memoization—or is it tabling, or learning, or lazy evaluation? In any case, I guess I fit your generalization about Haskell non-advocates, since if I were to code the solution in any other language, say Java, it would look like the Haskell code here: define interfaces

`Seminearring`

and`Character`

; write a grammar generic in these interfaces; implement the interfaces.

There’s nothing in the algorithm used here that can’t be explained in a straightforward way using commonly understood concepts. If you asked a student to write a program to multiply two numbers would you want to see a “literate” solution f(x,y) = e^(ln x + ln y) containing footnotes about range restrictions on x and y and arguments that this is a natural way for a statistical physicist to think of multiplication, sprinkled with commentary on the partition function, and perhaps a final statement that modern compilers are clever enough to optimize the above code to the IMUL instruction?

I agree with your statement, and we would have written a very different series of posts if our goal was to teach the average programmer how to solve the puzzle and ones like it. But that was not our goal; neither was it to promote Haskell. As for your question, the answer is that it depends on which class or project the student is in.

2.

This problem, like most of the ‘algorithmic’ problems ITA Software uses for recruiting, is designed so that there’s an easy-to-implement slow solution but plenty of room for a graduating senior to show they aren’t satisfied with mediocrity and are either clever enough or well-trained enough to do something about it. We don’t expect the average programmer to find our problems easy, but certainly they should be close to trivial for professors of computer science or mathematics.

It’s surprising to me that 2 professors at well-known universities would take it upon themselves to publish a detailed account of how to solve somebody else’s undergraduate-level take-home assignment - while the assignment is still outstanding. I think if I was a professor I’d be pretty upset if a fellow researcher grabbed problem sets for my undergraduate course off my course web site, wrote up the answers with source code, and distributed them in forums my students regularly visit.

Though I suppose if they did, I should be happy they obfuscated their work.

If I were a professor and didn’t want other people to solve my homework assignments, I wouldn’t put them on the Web without a due date. Though if I did and were shown how some other people would have approached the problem, I’d appreciate the connection even if it were not my preferred perspective. Certainly I wouldn’t complain about obfuscation on one hand and distribution on the other.

I hope this response makes you a bit more bemused at our content and a bit less baffled at our motivation.

```
Carl de Marcken
Chief Scientist
ITA Software
```

[1] I like generating functions and algebra too, when they’re used in a way that adds to understanding or leads to a better algorithm. But here I argue they serve no more purpose than using a matrix multiplication subroutine to multiply scalars.

Our approach naturally solves generalizations of the puzzle that you’d have no trouble coming up with (for example, by considering the second derivative). Surely you’re not just interested in solving the original puzzle, or the best algorithm would be to return a hard-coded answer.

I am confused by why one wants to say that what are normally thought of as morphisms or functions should be themselves Monoids. For example, the count of the number of letters is here its own Semiring. But isn’t the more natural way to represent it as a function from expression graphs to integers, where say expgraph is a Data type of a tree that has * nodes and + nodes, and string leaves, and is understood to represent a set of strings? Similarly, why do we need to add or multiply Tries, isn’t it more natural to say there is a space of Tries and there is a function defined on expression graphs that takes values in Tries?

I guess I am not seeing what the added structure, like associativity and distributivity, on Tries and Deriviatives is buying you. I mean yes it works, but could you not do it by defining functions from expression graphs to Integers for Count and for Volume, rather than saying these things themselves are structures?

Hi! If I understand your question correctly, it generalizes from grammars to other domain-specific languages: isn’t it more natural to represent an interpreter as a function from expressions to values rather than as a collection of operations on values?

The most immediate practical benefit of our representation is that Haskell’s lazy evaluation automatically processes each node of the expression graph once and no more. This memoization is not crucial for this problem—in fact, looking back at our code (it’s been a while…), I see that we need to define

`ten1`

…`ten6`

as variables local to the definition of`ten9`

in order to take advantage of lazy evaluation. But even that change is still much easier than representing sharing explicitly in a data type of expression graphs using node IDs and such. (A nice way to represent sharing explicitly is higher-order abstract syntax; it is easier to write interpreters for higher-order abstract syntax in our way than in the “more natural” way.)The most important property that our representation enforces, then, is not associativity or distributivity but

compositionality: the interpretation (i.e., semantic value) of an expression is determined by the interpretation of its parts and their mode of combination. In other words, we represent each compositional interpreter as a fold on the expression graph. (MapReduce also uses compositionality, as well as associativity, but not distributivity.)This domain-specific language has only one type. In another DSL with a more sophisticated type system, our representation of interpreters makes it easier to express type safety of the object language in the type system of the metalanguage. It also helps modularity. The benefits and precedents of this representation are described in a paper by Jacques Carette, Oleg Kiselyov, and myself and a paper by Christian Hofer, Klaus Ostermann, Tillmann Rendel, and Adriaan Moors.

These are reasons for a computer scientist like me to prefer to express an interpreter as a collection of operations on values. A mathematician like Dylan might state a different reason: expressions are just the initial object in the category of values.

(2009-02-27)