Tuesday, January 01, 2008

WARNING: This article is an interesting diversion and contains little practical value. It is an example of mind-stretching language torture and is intended for entertainment purposes only. If you use any of the C# or VB code presented in this article in production code, you will lose your job and shame your family!

You have been warned.

Lately, I've been revisiting some fun, old bits of Scheme code to see how they might be implemented in C#. I find this to be a rewarding exercise because it always stretches my mind a bit. Today, I'm playing with the following theoretical implementations of the three standard functions used to build data structures in Scheme:

(define (cons x y)
  (lambda (m) (m x y)))
(define (car z)
  (z (lambda (p q) p)))
(define (cdr z)
  (z (lambda (p q) q)))

If you've never seen Scheme code before, that probably looks like a bunch of gobblety-gook. Don't worry, we'll dig into these shortly. First, here are a few facts and ground rules for the uninitiated:

  1. Scheme is one of the two main dialects of Lisp (the other is Common Lisp).
  2. Scheme is based heavily on the untyped lambda calculus.
  3. Variables are dynamically typed, so you won't see any type annotations.
  4. There are no operator precedence rules. Instead, expressions are explicitly delimited by parentheses.
  5. Operators appear in prefix form instead of the infix form that we're all used to (i.e. + x y instead of x + y).
  6. Functions are values. In other words, functions are treated as first-class objects of the language in the same way that, say, numbers are.
  7. define is used to bind a value to a name.
  8. lambda is used to declare a function.

With these in mind, let's see how cons, car and cdr are actually used in Scheme.

Cons, Car and Cdr: The List Builders

Cons has a long history in functional programming and has even become part of the jargon of that world. In essence, cons constructs (duh!) pairs of values. Consider the following use of cons:

(cons 37 5)
;Value 11: (37 . 5)

The result of (cons 37 5) is an ordered pair holding 37 as the first value and 5 as the second. The comment that appears on the line after the expression is output from Edwin, the lovable and interactive Emacs-like editor distributed with MIT/GNU Scheme.

If we want to do something with our pair, we need to bind it to a name.

(define the-answer (cons 37 5))
;Value: the-answer

Now that our pair is named, we can view its contents.

the-answer
;Value 12: (37 . 5)

Retrieving the values from the pair is the job of Car and Cdr.

(car the-answer)
;Value: 37

(cdr the-answer)
;Value: 5

As you can see, car retrieves the first element from the pair, and cdr retrieves the second. Using car and cdr, we can add the values held by the pair.

(+ (car the-answer) (cdr the-answer))
;Value: 42

Neat!

It's a small step from constructing pairs to building singly-linked lists. The idea is that the first element of a pair will hold a value from the list, and the second element will hold the next pair. It's convenient to think of the resulting structure like so:

List Diagram 1

While that may be easy to visualize, our list actually has the recursive structure illustrated below:

List Diagram 2

Since the structure is recursive, we can think of each cell as a list containing a value and another list. Let me restate that. Each cell, by itself, is a list. For example, the second cell is a list containing the value 2 and the list represented by the third cell. The third cell is a list containing the value three and the empty list—that is, a list containing no elements.

Let's construct the list illustrated above using nested cons calls.

(define my-list (cons 1 (cons 2 (cons 3 nil))))
;Value: my-list

my-list
;Value 13: (1 2 3)

Extracting values from the list is simple. We just have to combine calls to car and cdr.

(car (cdr my-list))
;Value: 2

Using cons to build lists is a bit awkward. Thankfully, Scheme provides another function, list, to "cons up" lists.

(define my-other-list (list 4 5 6))
;Value: my-other-list

my-other-list
;Value 14: (4 5 6)

(car (cdr my-other-list))
;Value: 5

Cons, Car and Cdr: In Theory

At this point, you should have a basic understanding of how cons, car and cdr (along with the helper function, list) are used to build lists. At runtime, these functions are efficiently implemented as primitives. We can see this when we type cons into Edwin:

cons
;Value: #[primitive-procedure cons]

OK, hold onto your seats.

Let's drill into the definition of cons from the beginning of this article.

(define (cons x y)
  (lambda (m) (m x y)))

Knowing a little bit more Scheme than we did before, it's obvious that the above code binds the name cons to something. In fact, the name cons is bound to a function that takes two parameters, x and y. The body of this function is (lambda (m) (m x y)). What the heck is that? Well, it's an anonymous function—that is, a function with no name (exactly like a lambda expression from C# 3.0 and VB 9). That's right, this definition of cons doesn't return some special data structure that holds two values—it returns a function.

Is your mind warping yet? Yes? No? Maybe? Well, read on.

The anonymous function returned by cons takes one parameter. That parameter, m, is yet another function. The body of the function returned from cons calls m, passing x and y as the arguments. So, cons really just returns a function that knows how to call functions passed to it with x and y. Insane? A little.

Let's go ahead and define this version of cons in Edwin.

(define (cons x y)
  (lambda (m) (m x y)))
;Value: cons

(define the-answer (cons 37 5))
;Value: the-answer

So far, everything looks pretty much the same. However, the values of cons and the-answer are quite different than before.

cons
;Value 11: #[compound-procedure 11 cons]

the-answer
;Value 12: #[compound-procedure 12]

cons is no longer a primitive-procedure because we've rebound its name to our function. Notice that the-answer is also a procedure. In fact, it's a procedure with no name because cons now returns an anonymous function.

OK, let's poke this with a stick. We know that the anonymous function represented by the-answer knows how to call functions passed to it with the values that were originally passed into cons. So, we should be able to call the-answer with any function, for instance, the addition operator (which is just another function).

(the-answer +)
;Value: 42

Sweet! Passing the addition operator into the-answer caused the originally consed values to be added together. So, what happens if we try to use car to get the first value from the-answer?

(car the-answer)
;The object #[compound-procedure 12], passed as the first argument to car, is not the correct type.
;To continue, call RESTART with an option number:
; (RESTART 2) => Specify an argument to use in its place.
; (RESTART 1) => Return to read-eval-print level 1.
;Start debugger? (y or n):

Holy crap! We broke Scheme! Seriously, we did. The current implementations of car and cdr don't know how to handle values returned from our newly defined cons. We need to redefine those as well if we want them to work.

OK, take a deep breath and look at the definitions of car and cdr.

(define (car z)
  (z (lambda (p q) p)))
(define (cdr z)
  (z (lambda (p q) q)))

If it's not obvious how car and cdr work, take another look. In essence, car and cdr know how to call the function returned by cons with functions that retrieve the first and second elements respectively. Defining these in Edwin makes everything work again.

(define (car z)
  (z (lambda (p q) p)))
;Value: car

(car the-answer)
;Value: 37

(define (cdr z)
  (z (lambda (p q) q)))
;Value: cdr

(cdr the-answer)
;Value: 5

I should reemphasize that the above definitions are not how cons, car and cdr are actually implemented in Scheme. They are theoretical. In fact, these definitions are one way that cons, car and cdr can be implemented in the lambda calculus.

If your head isn't spinning at this point, I haven't done my job. To be fair, I've been demonstrating all of this in a language that's probably new to most of you. To correct that, here are working versions of cons, car and cdr, based on the definitions above, in C# and VB. Both programs compile and run properly with Visual Studio 2008. First, the C#:

using System;

namespace ConsCarCdr
{
  class Program
  {
    delegate T ConsCell<T>(Func<T, T, T> f);

    static ConsCell<T> Cons<T>(T x, T y)
    {
      return m => m(x, y);
    }

    static T Car<T>(ConsCell<T> z)
    {
      return z((p, q) => p);
    }

    static T Cdr<T>(ConsCell<T> z)
    {
      return z((p, q) => q);
    }

    static void Main()
    {
      var theAnswer = Cons(37, 5);

      Console.WriteLine(Car(theAnswer) + Cdr(theAnswer));
    }
  }
}

And here's the VB version:

Module Program
  Delegate Function ConsCell(Of T)(ByVal f As Func(Of T, T, T)) As T

  Function Cons(Of T)(ByVal x As T, ByVal y As T) As ConsCell(Of T)
    Return Function(m) m(x, y)
  End Function

  Function Car(Of T)(ByVal z As ConsCell(Of T)) As T
    Return z(Function(p, q) p)
  End Function

  Function Cdr(Of T)(ByVal z As ConsCell(Of T)) As T
    Return z(Function(p, q) q)
  End Function

  Sub Main()
    Dim theAnswer = Cons(37, 5)

    Console.WriteLine(Car(theAnswer) + Cdr(theAnswer))
  End Sub
End Module

So, what's the big deal? The big deal is that we are building data out of thin air. Nowhere did we define a class or struct with fields to hold the data. The values passed to Cons continue to exist entirely through the power of closure. (For more on closures in C#, see the article, "What's in a Closure?".)

Now for the bad news. The C# and VB definitions of Cons, Car and Cdr above can't be used to build lists—only ordered pairs. Remember that building a list with Cons produces a recursive data structure. So, the second argument passed to Cons must be another ConsCell<T>. Let's try that.

delegate T ConsCell<T>(Func<T, ConsCell<T>, T> f);

static ConsCell<T> Cons<T>(T x, ConsCell<T> y)
{
  return m => m(x, y);
}

static T Car<T>(ConsCell<T> z)
{
  return z((p, q) => p);
}

static ConsCell<T> Cdr<T>(ConsCell<T> z)
{
  return z((p, q) => q);
}

Ugh! That doesn't compile. The problem is that Cdr's lambda expression is expected to return a value of type T but returns a value of type ConsCell<T> instead. We can hack around this with a dirty trick.

static ConsCell<T> Cdr<T>(ConsCell<T> z)
{
  ConsCell<T> result = null;
  z((p, q) => { result = q; return p; });
  return result;
}

I dislike this hack primarily because it requires a lambda expression with multiple statements, and VB doesn't support that. In addition, I don't like returning p from the lambda expression and throwing its value away. However, this gives us the hint that we need to find a cleaner approach.

delegate void Chooser<T1, T2>(T1 x, T2 y);

delegate void ConsCell<T>(Chooser<T, ConsCell<T>> f);

static ConsCell<T> Cons<T>(T x, ConsCell<T> y)
{
  return m => m(x, y);
}

static T Car<T>(ConsCell<T> z)
{
  T result = default(T);
  z((p, q) => result = p);
  return result;
}

static ConsCell<T> Cdr<T>(ConsCell<T> z)
{
  ConsCell<T> result = null;
  z((p, q) => result = q);
  return result;
}

It requires more hacks to make this work in VB. In their current incarnation, VB lambda expressions simply aren't all that robust. First, they don't support multiple statements (that's a little tricky syntactically to achieve in VB). The second (and more cumbersome) problem is that VB lambda expressions don't work very well with delegates that return Void. VB lambdas must always return a value. (These differences are mainly due to the fact that C# and VB have very different destinies.)

With hacks in place, the VB version looks like so:

Delegate Function Chooser(Of T1, T2)(ByVal x As T1, ByVal y As T2) As Integer

Delegate Sub ConsCell(Of T)(ByVal f As Chooser(Of T, ConsCell(Of T)))

Function Cons(Of T)(ByVal x As T, ByVal y As ConsCell(Of T)) As ConsCell(Of T)
  Return Function(m) m(x, y)
End Function

Function Assign(Of T)(ByRef left As T, ByVal right As T) As Integer
  left = right
  Return 0
End Function

Function Car(Of T)(ByVal z As ConsCell(Of T)) As T
  Dim result As T = Nothing
  z(Function(p, q) Assign(result, p))
  Return result
End Function

Function Cdr(Of T)(ByVal z As ConsCell(Of T)) As ConsCell(Of T)
  Dim result As ConsCell(Of T) = Nothing
  z(Function(p, q) Assign(result, q))
  Return result
End Function

Now that Cons, Car and Cdr are defined, it's a simple matter to define a List method to help "cons up" lists.

static ConsCell<T> List<T>(params T[] items)
{
  ConsCell<T> result = null;
  for (int i = items.Length - 1; i >= 0; i--)
    result = Cons(items[i], result);

  return result;
}

We can even write recursive methods that process our lists.

static void PrintList<T>(ConsCell<T> list)
{
  if (list == null)
    return;
  else
  {
    Console.WriteLine(Car(list));
    PrintList(Cdr(list));
  }
}

Oh, and the client code? It's looking good.

var numbers = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

PrintList(numbers);

Success! We've built meaningful data using only functions in both C# and VB. It required some dirty tricks to get around static-typing and VB's skimpy lambda expression support, but we did it. We built data out of air.

I need to reiterate that you should not use any of this in production code. I have presented it for your enjoyment. If you learned something from this article (i.e. you were able to wrap your head around how this works), you might consider taking steps into the world of functional programming. Try a language like ML, F#, Haskell, Common Lisp or Scheme. Research the various lambda calculi or first-order logic to see where the functional programming ideas were born. Mainstream development has just started to test the academic waters of functional programming in languages like C# and VB. It can't hurt to get a little wet.

I must make mention of my beautiful editor and trophy wife. She took the time and shed the tears necessary to understand the concepts presented here and to improve my clunky, caveman-like writing style. Remember, if she gets it, you'd better. :-)

posted on Tuesday, January 01, 2008 1:33:25 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]

kick it on DotNetKicks.com