This sentence is false

functional programming, software, and emacs.

Functional Pearl: Trees

[This post is Literate Haskell. If you copy the entire post into a file with a .lhs extension, you can load up that file in your favorite Haskell Interpreter, and it should work. Then you can test the functions mentioned herein.]

Given a binary tree (you know, a thing that’s either a leaf with some data or two branches, both of which are binary trees), can you compute a new binary tree with exactly the same structure, but with every leaf’s value replaced by the minimum value in the tree?

If you’re a programmer, of course you can. One simple pass to find the minimum value in the tree, and another to replace each leaf with the minimum value. Here is the data type and an implementation of this idea in Haskell.


> data Tree a = Leaf a          -- The leaf holds data of any type.
>             | Branch (Tree a) (Tree a) deriving (Show)

> replaceMin2 t = let m = findMin t
>                 in propagate t m
>   where
>     findMin (Leaf m) = m
>     findMin (Branch left right) = findMin left `min` findMin right

>     propagate (Leaf _) m = Leaf m
>     propagate (Branch left right) m = Branch (propagate left m)
>                                              (propagate right m)

Simple, and clear. Now, can you think of a way to do it in one pass? You might consider making one pass in which (1) for each leaf, replace its value with a mutable cell that can hold a value, returning the value of the leaf, then (2) for each branch, recursively compute the min on the left and right. At the end, you set the cell you gave to all the leaves to the value of the min you computed in this single pass, so that each leaf gets the value at once.

This is a good idea. However, this will change the type of the new tree. Instead of having a tree of integers, you’ll have a tree of cells-pointing-to-integers, and you’ll need another pass to get rid of them.

Here is a solution. One pass. Ponder it.


> replaceMin :: Tree Int -> Tree Int
> replaceMin t = let (t', m) = rpMin (t, m) in t'

> rpMin :: (Tree Int, Int) -> (Tree Int, Int)
> rpMin (Leaf a, m) = (Leaf m, a)
> rpMin (Branch l r, m) = let (l', ml) = rpMin (l, m)
>                             (r', mr) = rpMin (r, m)
>                    in (Branch l' r', ml `min` mr)

Note how in replaceMin the variable m is both bound to the result of calling rpMin and as an argument to rpMin. Totally insane.

That Doesn’t Work

I know; I said the same thing. The contract of rpMin is that it consumes a tuple of the tree and the actual minimum value of the tree (!), and returns (1) the tree with the minimum value stuck in the right places, and (2) the actual minimum value. If you’re used to imperative languages, or even functional languages without having used lazy evaluation, this contract is utter and complete nonsense. In fact, it is strong grounds for doubting a programmer’s competence.

The reason it is nonsense is because call-by-value parameter passing is most programmer’s default mindset. When a function is applied to its arguments, those arguments are computed down to some value of the appropriate type, and only then is the function invoked. This is the function application semantics of many programming languages, such as Java, C, Perl, etc.

The function replaceMin depends upon a strange and wonderful semantics called call-by-need semantics. In call-by-need semantics, when a function is invoked, the arguments are not evaluated. Only when, and if, the body of the function demands one of the arguments are they actually turned into a value. This fact makes replaceMin possible. Since nothing in rpMin actually tests the value of m (e.g. by comparing it to 0 or something), but simply uses it, we are allowed to refer to a value which cannot even by computed until rpMin is finished.

This means the algorithm given above essentially implements the cells idea suggested before, with at least two advantages: (1) it’s pure, meaning there is no way to step on our toes by assigning the wrong thing at the wrong time; (2) it really requires only one traversal, and doesn’t change the type of our function. It computes a new Tree of ints, just like the old one.

Acknowledgements

The function replaceMin and helper comes from http://www.cse.ogi.edu/pacsoft/projects/rmb/, in particular under “Other Artifacts”, linked to by the bullet point, “Slides for a talk on the recursive do-notation.” I have omitted a link to the PDF on purpose, so you’ll see all the cool stuff those people have done.

Update 18 Feb 2008. As one person mentioned in the comments, this is quite obviously call-by-need, not call-by-name, semantics. I have changed it in the post to avoid further confusion.

16 February 2008 Posted by dbueno | haskell | , , | 9 Comments