WARNING: This article is an interesting diversion and contains little practical value. It is an example of mind-stretching language torture and is intended for entertainment purposes only. If you use any of the C# or VB code presented in this article in production code, you will lose your job and shame your family! You have been warned.
Lately, I've been revisiting some fun, old bits of Scheme code to see how they might be implemented in C#. I find this to be a rewarding exercise because it always stretches my mind a bit. Today, I'm playing with the following theoretical implementations of the three standard functions used to build data structures in Scheme:
If you've never seen Scheme code before, that probably looks like a bunch of gobblety-gook. Don't worry, we'll dig into these shortly. First, here are a few facts and ground rules for the uninitiated:
With these in mind, let's see how cons, car and cdr are actually used in Scheme.
Cons has a long history in functional programming and has even become part of the jargon of that world. In essence, cons constructs (duh!) pairs of values. Consider the following use of cons:
The result of (cons 37 5) is an ordered pair holding 37 as the first value and 5 as the second. The comment that appears on the line after the expression is output from Edwin, the lovable and interactive Emacs-like editor distributed with MIT/GNU Scheme.
If we want to do something with our pair, we need to bind it to a name.
Now that our pair is named, we can view its contents.
Retrieving the values from the pair is the job of Car and Cdr.
As you can see, car retrieves the first element from the pair, and cdr retrieves the second. Using car and cdr, we can add the values held by the pair.
Neat!
It's a small step from constructing pairs to building singly-linked lists. The idea is that the first element of a pair will hold a value from the list, and the second element will hold the next pair. It's convenient to think of the resulting structure like so:
While that may be easy to visualize, our list actually has the recursive structure illustrated below:
Since the structure is recursive, we can think of each cell as a list containing a value and another list. Let me restate that. Each cell, by itself, is a list. For example, the second cell is a list containing the value 2 and the list represented by the third cell. The third cell is a list containing the value three and the empty list—that is, a list containing no elements.
Let's construct the list illustrated above using nested cons calls.
Extracting values from the list is simple. We just have to combine calls to car and cdr.
Using cons to build lists is a bit awkward. Thankfully, Scheme provides another function, list, to "cons up" lists.
At this point, you should have a basic understanding of how cons, car and cdr (along with the helper function, list) are used to build lists. At runtime, these functions are efficiently implemented as primitives. We can see this when we type cons into Edwin:
OK, hold onto your seats.
Let's drill into the definition of cons from the beginning of this article.
Knowing a little bit more Scheme than we did before, it's obvious that the above code binds the name cons to something. In fact, the name cons is bound to a function that takes two parameters, x and y. The body of this function is (lambda (m) (m x y)). What the heck is that? Well, it's an anonymous function—that is, a function with no name (exactly like a lambda expression from C# 3.0 and VB 9). That's right, this definition of cons doesn't return some special data structure that holds two values—it returns a function.
Is your mind warping yet? Yes? No? Maybe? Well, read on.
The anonymous function returned by cons takes one parameter. That parameter, m, is yet another function. The body of the function returned from cons calls m, passing x and y as the arguments. So, cons really just returns a function that knows how to call functions passed to it with x and y. Insane? A little.
Let's go ahead and define this version of cons in Edwin.
So far, everything looks pretty much the same. However, the values of cons and the-answer are quite different than before.
cons is no longer a primitive-procedure because we've rebound its name to our function. Notice that the-answer is also a procedure. In fact, it's a procedure with no name because cons now returns an anonymous function.
OK, let's poke this with a stick. We know that the anonymous function represented by the-answer knows how to call functions passed to it with the values that were originally passed into cons. So, we should be able to call the-answer with any function, for instance, the addition operator (which is just another function).
Sweet! Passing the addition operator into the-answer caused the originally consed values to be added together. So, what happens if we try to use car to get the first value from the-answer?
Holy crap! We broke Scheme! Seriously, we did. The current implementations of car and cdr don't know how to handle values returned from our newly defined cons. We need to redefine those as well if we want them to work.
OK, take a deep breath and look at the definitions of car and cdr.
If it's not obvious how car and cdr work, take another look. In essence, car and cdr know how to call the function returned by cons with functions that retrieve the first and second elements respectively. Defining these in Edwin makes everything work again.
I should reemphasize that the above definitions are not how cons, car and cdr are actually implemented in Scheme. They are theoretical. In fact, these definitions are one way that cons, car and cdr can be implemented in the lambda calculus.
If your head isn't spinning at this point, I haven't done my job. To be fair, I've been demonstrating all of this in a language that's probably new to most of you. To correct that, here are working versions of cons, car and cdr, based on the definitions above, in C# and VB. Both programs compile and run properly with Visual Studio 2008. First, the C#:
And here's the VB version:
So, what's the big deal? The big deal is that we are building data out of thin air. Nowhere did we define a class or struct with fields to hold the data. The values passed to Cons continue to exist entirely through the power of closure. (For more on closures in C#, see the article, "What's in a Closure?".)
Now for the bad news. The C# and VB definitions of Cons, Car and Cdr above can't be used to build lists—only ordered pairs. Remember that building a list with Cons produces a recursive data structure. So, the second argument passed to Cons must be another ConsCell<T>. Let's try that.
Ugh! That doesn't compile. The problem is that Cdr's lambda expression is expected to return a value of type T but returns a value of type ConsCell<T> instead. We can hack around this with a dirty trick.
I dislike this hack primarily because it requires a lambda expression with multiple statements, and VB doesn't support that. In addition, I don't like returning p from the lambda expression and throwing its value away. However, this gives us the hint that we need to find a cleaner approach.
It requires more hacks to make this work in VB. In their current incarnation, VB lambda expressions simply aren't all that robust. First, they don't support multiple statements (that's a little tricky syntactically to achieve in VB). The second (and more cumbersome) problem is that VB lambda expressions don't work very well with delegates that return Void. VB lambdas must always return a value. (These differences are mainly due to the fact that C# and VB have very different destinies.)
With hacks in place, the VB version looks like so:
Now that Cons, Car and Cdr are defined, it's a simple matter to define a List method to help "cons up" lists.
We can even write recursive methods that process our lists.
Oh, and the client code? It's looking good.
Success! We've built meaningful data using only functions in both C# and VB. It required some dirty tricks to get around static-typing and VB's skimpy lambda expression support, but we did it. We built data out of air.
I need to reiterate that you should not use any of this in production code. I have presented it for your enjoyment. If you learned something from this article (i.e. you were able to wrap your head around how this works), you might consider taking steps into the world of functional programming. Try a language like ML, F#, Haskell, Common Lisp or Scheme. Research the various lambda calculi or first-order logic to see where the functional programming ideas were born. Mainstream development has just started to test the academic waters of functional programming in languages like C# and VB. It can't hurt to get a little wet.
I must make mention of my beautiful editor and trophy wife. She took the time and shed the tears necessary to understand the concepts presented here and to improve my clunky, caveman-like writing style. Remember, if she gets it, you'd better.
Page rendered at Tuesday, January 06, 2009 12:05:59 AM (Eastern Standard Time, UTC-05:00)