Functional Pearl: Trees
[This post is Literate Haskell. If you copy the entire post into a file with a .lhs extension, you can load up that file in your favorite Haskell Interpreter, and it should work. Then you can test the functions mentioned herein.]
Given a binary tree (you know, a thing that’s either a leaf with some data or two branches, both of which are binary trees), can you compute a new binary tree with exactly the same structure, but with every leaf’s value replaced by the minimum value in the tree?
If you’re a programmer, of course you can. One simple pass to find the minimum value in the tree, and another to replace each leaf with the minimum value. Here is the data type and an implementation of this idea in Haskell.
> data Tree a = Leaf a -- The leaf holds data of any type. > | Branch (Tree a) (Tree a) deriving (Show) > replaceMin2 t = let m = findMin t > in propagate t m > where > findMin (Leaf m) = m > findMin (Branch left right) = findMin left `min` findMin right > propagate (Leaf _) m = Leaf m > propagate (Branch left right) m = Branch (propagate left m) > (propagate right m)
Simple, and clear. Now, can you think of a way to do it in one pass? You might consider making one pass in which (1) for each leaf, replace its value with a mutable cell that can hold a value, returning the value of the leaf, then (2) for each branch, recursively compute the min on the left and right. At the end, you set the cell you gave to all the leaves to the value of the min you computed in this single pass, so that each leaf gets the value at once.
This is a good idea. However, this will change the type of the new tree. Instead of having a tree of integers, you’ll have a tree of cells-pointing-to-integers, and you’ll need another pass to get rid of them.
Here is a solution. One pass. Ponder it.
> replaceMin :: Tree Int -> Tree Int > replaceMin t = let (t', m) = rpMin (t, m) in t' > rpMin :: (Tree Int, Int) -> (Tree Int, Int) > rpMin (Leaf a, m) = (Leaf m, a) > rpMin (Branch l r, m) = let (l', ml) = rpMin (l, m) > (r', mr) = rpMin (r, m) > in (Branch l' r', ml `min` mr)
Note how in replaceMin the variable m is both bound to the result of calling rpMin and as an argument to rpMin. Totally insane.
That Doesn’t Work
I know; I said the same thing. The contract of rpMin is that it consumes a tuple of the tree and the actual minimum value of the tree (!), and returns (1) the tree with the minimum value stuck in the right places, and (2) the actual minimum value. If you’re used to imperative languages, or even functional languages without having used lazy evaluation, this contract is utter and complete nonsense. In fact, it is strong grounds for doubting a programmer’s competence.
The reason it is nonsense is because call-by-value parameter passing is most programmer’s default mindset. When a function is applied to its arguments, those arguments are computed down to some value of the appropriate type, and only then is the function invoked. This is the function application semantics of many programming languages, such as Java, C, Perl, etc.
The function replaceMin depends upon a strange and wonderful semantics called call-by-need semantics. In call-by-need semantics, when a function is invoked, the arguments are not evaluated. Only when, and if, the body of the function demands one of the arguments are they actually turned into a value. This fact makes replaceMin possible. Since nothing in rpMin actually tests the value of m (e.g. by comparing it to 0 or something), but simply uses it, we are allowed to refer to a value which cannot even by computed until rpMin is finished.
This means the algorithm given above essentially implements the cells idea suggested before, with at least two advantages: (1) it’s pure, meaning there is no way to step on our toes by assigning the wrong thing at the wrong time; (2) it really requires only one traversal, and doesn’t change the type of our function. It computes a new Tree of ints, just like the old one.
Acknowledgements
The function replaceMin and helper comes from http://www.cse.ogi.edu/pacsoft/projects/rmb/, in particular under “Other Artifacts”, linked to by the bullet point, “Slides for a talk on the recursive do-notation.” I have omitted a link to the PDF on purpose, so you’ll see all the cool stuff those people have done.
Update 18 Feb 2008. As one person mentioned in the comments, this is quite obviously call-by-need, not call-by-name, semantics. I have changed it in the post to avoid further confusion.
9 Comments »
Leave a comment
-
Archives
- July 2009 (1)
- March 2009 (2)
- July 2008 (1)
- May 2008 (1)
- April 2008 (1)
- March 2008 (1)
- February 2008 (1)
- October 2007 (1)
- August 2007 (1)
- July 2007 (2)
- April 2007 (4)
- August 2006 (1)
-
Categories
-
RSS
Entries RSS
Comments RSS
I think it is call-by-need, not call-by-name…
How does the runtime system actually converts those operations to assembly? The CPU isn’t lazy, so someone has to manage those abstractions. So maybe in the end someone has to travel the tree two times anyway, but maybe the second time in a less coherent way, so it may end slower.
Stefano,
I believe one traversal is guaranteed as a consequence of the semantics (I’d like to hear otherwise, if not).
No reduction (evaluation of a term) happens unless some operation demands it. In the case above, when you “call” replaceMin, nothing inside happens. But consider what happens when you traverse the resulting tree. Suppose the tree has at least few nodes, so that its root is a branch.
When you probe the root (i.e. the (Branch l r) case inside rpMin is evaluated just as far as need be), then you’ll get back Branch, and the left and right subtrees will be suspended computations, waiting until you ask what they are before computing it.
Suppose the left child of the root of the result tree is a Leaf node, and you demand it, and its value. First, as is clear from the definition of rpMin, *every leaf node gets the same object*. That is, every leaf node is changed to `m’, the as-yet-to-be-computed minimum element of the tree. So when you ask for the first leaf node in the tree, it forces the runtime to compute its value by doing the tree traversal. After the traversal, since each leaf in the tree shares the same data, each Leaf computation is replaced by the minimum value. So next time you visit (another) leaf, it won’t do any more computation.
Is this argument convincing enough?
It’s a tree of “Ints”, not a tree of “ints”.
In (standard) Haskell, all type are reference types.
A Haskell Int is in fact a cell containing an int, so your tree started out as a tree of cells, and that’s why to don’t need to change the tree to cells and back.
This solution would not be a one-pass solution if the value at each node was actually a raw (unshared) integer value, and not a reference to a heap value.
This solution is not really “one-pass”. It’s “one-pass, plus a level of indirection at each node, plus some machinery for scheduling computations”. While interesting, this is not necessarily simpler or more efficient than a “two-pass plus the operational semantics of C”
What *is* cool about this solution (and lazy programming in Haskell general) is that it compactly encodes a distributed algorithm in a network-agnostic way, and so the code can move from a single-thread to a broad network using just one same disributed-computation engine (the Haskell runtime and lilbraries), without recoding the algorithm as the network scale changes.
It’s nice to see my 25 year old example showing up again.
No front flip yet. Or back flip. Being the responsible adult that I am, I decided that I would postpone the flipping until after I got back from McKinley. The last thing I need to be doing when I’m supposed to be training, is recovering from a back/neck injury.
I think this example is from the paper “Using Circular Programs to Eliminate Multiple Traverals of Data” by Richard Bird (Acta Informatica, 21:239–250, 1984).
@augustss: Care to let us know where your example is?
Also, I think what’s going on is being missed by what is disparagingly called “an imperative mind-set”. The key to understanding the result is that the tree is not being edited, but the old tree’s shape is being used to construct a completely different (that is, a new) tree.
There is no second pass, because there is instead, a second tree. We are not alchemists, taking a lead brick, and waving a wand over it to “change” it to gold. We are craftsmen (and -women!), tracing out the dimensions of the lead brick, casting some gold in the same shape, then tossing the lead brick aside.
Just like the movie disclaimers, No Trees Were Harmed During the Making of this Production.