Next: What's in a Closure?
After a bit of hiatus, I am long overdue to get some code up on this blog. To give myself some direction, this is the start of an informal series that will attempt to shed some light on the functional programming ideas that have been sneaking into the C# world. I've got a lot that I want to write about so stay tuned...
As an exercise (and because I'm a fan of cliché), I've been toying with Fibonacci numbers lately. I've been playing simply with the classic recursive Fibonacci equation. Please note that there are much faster algorithms to use to calculate Fibonacci numbers that don't use recursion. But, those tend to lack the simple elegance of the classic. My goal here is not to teach recursion (Fibonacci numbers aren't very good for that purpose anyway) but to explore some other possibilities.
In case you're not familiar with them, the basic idea is that any number after the first two starting numbers is the sum of the previous two numbers in the sequence. Huh? OK, maybe the idea is better represented by the following equation (stolen shamelessly from Wikipedia):
So, if n is 0, the answer is 0. If n is 1, the answer is 1. If n is 2, the answer is the Fibonacci of 1 plus the Fibonacci of 0, or 1. And so on. In sequence, Fibonacci numbers get big really quickly:
Here is the equation in C# code:
That's pretty simple and works exactly as it should. But, there's one serious problem: it's extraordinarily slow. The problem with the Fibonacci equation is that the Fibonacci of every preceding number must be calculated (often several times) in order to calculate the Fibonacci of n. The following diagram illustrates the problem. It shows the calculations that are necessary to calculate the Fibonacci of 4.
As you can see, the number of calculations increases exponentially. To determine how serious the problem is, let's write a simple application to measure the performance of our Fibonacci function:
The results start to get interesting around the 25th iteration:
At the risk of belaboring this, let's try a different approach. With a small adjustment to our Fibonacci function, we can measure the number of steps that it takes instead of measuring time. After this change our application looks something like this:
Here are the results starting at the 25th iteration:
The number of steps taken to calculate a Fibonacci number is staggering! But, if you look carefully, you'll notice a pattern. It turns out that the number of steps grows in a Fibonacci sequence. That is, the number of steps needed to calculate a Fibonacci number is the sum of the number of steps that it took to calculate the previous two Fibonacci numbers. This same pattern also appears in our performance test measuring time (though not as precisely). If you're familiar with Big O Notation, the performance can be represented as O(fibonacci(n)). If that looks like Greek to you, just know that the performance gets much worse as the value of n increases.
If you're still with me (and still remember the title of this article), you probably already know what the solution is: caching. Instead of calculating a Fibonacci number every time it's needed, we should store the result of the calculation so that it can be retrieved later. This can be done easily by employing a System.Collections.Generic.Dictionary:
If you run the performance tests on this, you will get some amazing results:
For all of my Big O buddies out there, our algorithm is now O(1). However, it comes at a cost:
There are several possible solutions to these issues but the solution that I propose is to use a closure.
A closure is a function that is bound to the environment in which it is declared. If that doesn't make sense, read on. In C# 2.0, we can implement closures using anonymous methods. Consider this code carefully:
In this code, "a" is a delegate of type Action that is assigned to an anonymous method which simply prints the value of the local variable "x" to the console. The interesting bit is that "x" is actually declared outside of the anonymous method. To make this possible, "a" represents a closure that is bound to the local variable "x" because "x" is part of the environment (the parenting method body) in which "a" is declared.
Please note that I did not say that "a" is "bound to the value of 'x'". I said that "a" is "bound to the variable of 'x'". If "a" were bound to the value of "x", this code would print 0 to the console because that's the value assigned to "x" when "a" is declared. However, because it is bound to the variable "x", 1 will be output to the console because "x" is reassigned before "a" is called. This binding is persisted even if "x" goes out of scope:
The above code prints 1 to the console instead of zero even though "x" is out of scope when "a" is called.
I intend to write a future article about the C# compiler magic that makes closures possible but, for the moment, we will remain blissfully ignorant of the internals. In addition, if this information seems trivial or unimportant, please realize that closures are very important to the functional programming constructs coming in C# 3.0. In fact, many practices of functional programming (e.g. currying) are made possible by closures.
Getting back to our Fibonacci function, we can use a closure to solve the three problems that I mentioned earlier. Here is a working implementation:
Make sure that you read this method carefully to take everything in:
There are a couple of other minor performance optimizations that can be done to speed things up even further but I left them out for clarity. Here is my final method with optimizations added in:
Page rendered at Sunday, September 07, 2008 9:29:56 PM (Eastern Standard Time, UTC-05:00)