Monday, February 12, 2007

kick it on DotNetKicks.com

Previous: What's in a Closure?


Today I want to revisit our recursive Fibonacci number function from my previous article and see if there might be some ways to improve it by drawing on another functional programming technique. For reference, here's the Fibonacci function:

delegate int FibonacciCalculator(int n);

int Fibonacci(int n)
{
  int[] results = new int[n + 1];

  FibonacciCalculator calculator = null;
  calculator = delegate(int x)
  {
    if (x < 2)
      return x;
    else
    {
      if (results[x] == 0)
        results[x] = calculator(x - 1) + calculator(x - 2);

      return results[x];
    }
  };

  return calculator(n);
}

If you stuck with me through the article on closures, you know that a closure is created by this function because the local "results" variable is used within an anonymous method even though it is declared outside of it. So, the C# compiler will turn the "results" variable into a field on a compiler-generated helper class. What's not as obvious is that "calculator" is also a local variable declared outside of the anonymous method but used inside. If you missed it, look again. "calculator" isn't a method call. In reality, it's a variable of the FibonacciCalculator delegate type which happens to reference the method that we're calling from (and hence, resulting in recursion). This might seem like a minor detail but it's extremely important. The compiler actually generates code that looks something like this:

delegate int FibonacciCalculator(int n);

sealed class FibonacciClosure
{
  public int[] results;
  public FibonacciCalcualtor calculator;

  public int CalculatorMethod(int x)
  {
    if (x < 2)
      return x;
    else
    {
      if (results[x] == 0)
        results[x] = calculator(x - 1) + calculator(x - 2);

      return results[x];
    } 
  }
}

int Fibonacci(int n)
{
  FibonacciClosure closure = new FibonacciClosure();
 
  closure.results = new int[n + 1];
  closure.calculator = new FibonacciCalculator(closure.CalculatorMethod);
 
  return closure.calculator(n);
}

Looking at the compiler-generated code actually helps clarify this. If you look at "CalculatorMethod", you'll notice that it is not calling itself recursively. Instead it is calling the "calculator" delegate and the effect is that it calls itself recursively. This has importance that is actually a bit mind-bending but we'll get to that in a moment.

First, in the original article, I referred to the technique of storing the results from our function for retrieval later as caching. I used that term generally but there is a more specific term: Memoization. Now, a good object-oriented programmer would recognize that this "memoization" is an idea that should be reused. Perhaps some sort of abstract class could be created to solve this problem? The class could maintain the lookup table and handle when the function is called and when results from the lookup table are returned. That would work, right? Well, it might work but it sounds like a lot of extra effort to me and it would be awkward to reuse. In reality, there is a much more elegant (and reusable) solution to this problem. The answer is to use another closure. As usual, we'll stick with C# 2.0 for now.

static FibonacciCalculator Memoize(FibonacciCalculator function)
{
  Dictionary<int, int> results = new Dictionary<int, int>();

  return delegate(int key)
  {
    int value;
    if (results.TryGetValue(key, out value))
      return value;

    value = function(key);
    results.Add(key, value);
    return value;
  };
}

Look carefully at what this method does. It takes a delegate of type "FibonacciCalculator" and returns a new delegate of the same type. The new delegate is declared as an anonymous method which uses a Dictionary to store and retrieve the results of the passed function. This new delegate is a closure because it references the local variables "results" and "function". That means that the scope of those referenced variables will be promoted and will exist outside the Memoize function (see my previous article if your need a refresher). With that in mind, here's how our Fibonacci function changes:

int Fibonacci(int n)
{
  FibonacciCalculator calculator = null;
  calculator = delegate(int x)
  {
    if (x < 2)
      return x;
    else
      return calculator(x - 1) + calculator(x - 2);
  };

  calculator = Memoize(calculator);

  return calculator(n);
}

This is a bit mind-bending if you're seeing a technique like this for the first time so let's tear it apart a bit. For those less-familiar with functional programming, it can be easier to see when expressed in the classes that are generated by the compiler:

delegate int FibonacciCalculator(int n);

sealed class MemoizeClosure
{
  public FibonacciCalculator function;
  public Dictionary<int, int> results;

  public int MemoizedFunction(int key)
  {
    int value;
    if (!results.TryGetValue(key, out value))
    {
      value = function(key);
      results.Add(key, value); 
    }

    return value;
  }
}

FibonacciCalculator Memoize(FibonacciCalculator function)
{
  MemoizeClosure closure = new MemoizeClosure();

  closure.function = function;
  closure.results = new Dictionary<int, int>();

  return new FibonacciCalculator(closure.MemoizedFunction);
}

sealed class FibonacciClosure
{
  public FibonacciCalculator calculator;

  public int FibonacciFunction(int x)
  {
    if (x < 2)
      return x;
    else
      return calcualtor(x - 1) + calculator(x - 2);
  }
}

int Fibonacci(int n)
{
  FibonacciClosure closure = new FibonacciClosure()
 
  closure.calculator = new FibonacciCalculator(closure.FibonacciFunction);

  closure.calculator = Memoize(closure.calculator);
}

Again, read the code carefully to see what's actually happening. The idea is that we've created a function which wraps our Fibonacci function with the memoization feature. This works because our Fibonacci function uses a delegate ("calculator") for recursion instead of calling itself directly. The delegate gets assigned to the new "memoized" function so the Fibonacci function actually calls that instead.

If you run this code, you will find that it is almost as fast as the optimized Fibonacci function from the previous article. The only differences are:

  1. A slightly longer time to get the first result because the extra closure and Dictionary have to be instantiated.
  2. The added indirection and Dictionary slow the algorithm down slightly but the difference doesn't register at the millisecond level.

Regardless of these extremely minor consequences, we now have a reusable Memoize function that can be used in other situations. Err... ok, it's not that reusable yet because it is tied to our FibonacciCalculator type. I held off on using generics until now because they can look bizarre to the uninitiated but we can make this reusable by creating our own generic delegate type. Instead of FibonacciCalculator, we could use this:

delegate TResult Func<TArg, TResult>(TArg arg);

(Note that several generic delegates just like will be included with the .NET Framework 3.5 and used by the C# 3.0 compiler.)

Our Memoize function can become generic itself and use the new delegate like this:

static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function)
{
  Dictionary<TArg, TResult> results = new Dictionary<TArg, TResult>();

  return delegate(TArg key)
  {
    TResult value;
    if (results.TryGetValue(key, out value))
      return value;

    value = function(key);
    results.Add(key, value);
    return value;
  };
}

And a small change is made to our Fibonacci function:

int Fibonacci(int n)
{
  Func<int, int> calculator = null;
  calculator = delegate(int x)
  {
    if (x < 2)
      return x;
    else
      return calculator(x - 1) + calculator(x - 2);
  };

  calculator = Memoize(calculator);

  return calculator(n);
}

Now we have a reusable memoization function that could be used with any function that takes one argument and returns a result. And, the solution allows us to capture the elegance of the original Fibonacci algorithm and achieve reasonable performance without the added baggage of explicitly declaring and maintaining storage.

If you are interested in other applications of this technique, Wes Dyer has some excellent blog posts on memoization here and here. Wes has some very cool ways to use this technique to implement the Singleton and Multiton design patterns.

Good luck and happy coding!


Previous: What's in a Closure?

kick it on DotNetKicks.com

posted on Monday, February 12, 2007 8:28:24 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]

kick it on DotNetKicks.com