
Previous: Fibonacci Numbers, Caching and Closures - Next: Using Automatic Memoization
This article continues my series on functional programming notions. One thing
that you will notice as this series progresses is that I am trying to resist
the temptation to use C# 3.0 syntax. A trend that I've noticed is that many are using
C# 3.0 to explain functional programming and, while there's nothing terribly
wrong with this, it can potentially confuse the reader. Few of us (with the
exception of
Bill
Wagner) are truly comfortable with C# 3.0 yet. So, to understand these
explanations, we have to first unravel the C# 3.0 syntax to get at the subtleties
of the meat. Therefore, I am attempting to present this information in C#
2.0. It is arguable that many developers have yet to become proficient with C#
2.0 but I think its slightly more verbose syntax will
help to make the concepts clear and to prepare the reader for the conciseness of
C# 3.0. Later on, I will introduce C# 3.0 syntax but I'll try not to do it until the
concepts are explained.
In my previous article, I
promised that I would look a bit more deeply into the internals of closures. But
first, let's refresh our memories as to what exactly a closure is.
What's A Closure?
A closure
is a function that is bound to the environment in which it is declared. Thus,
the function can reference elements from the environment within it's body. In
the case of a C# 2.0 anonymous method, the environment to which it is bound is
its parenting method body. This means that local variables from the parenting
method body can be referenced within the anonymous method's body. So, this code
prints 0 to the console as expected:
delegate void
Action();
static void Main(string[]
args)
{
int x = 0;
Action a = delegate { Console.WriteLine(x);
};
a();
}
Most developers don't have any problem with the code above. A local variable
"x" is declared and initialized to 0. Then, a new delegate "a" of type Action is
declared and assigned to an anonymous method that writes "x" to the console.
Finally, "a" is called and the value of "x" (0) is printed to the console. The
rub occurs when the code is changed like this:
delegate void
Action();
static void Main(string[]
args)
{
int x = 0;
Action a = delegate { Console.WriteLine(x);
};
x = 1;
a();
}
Now, "x" is reassigned to a value of 1 before "a" is called. What will be
output to the console?
(NOTE: This has actually been the cause of misunderstanding and
controversy. There has even been confusion by some at Microsoft itself about
this issue. You can read about it
here
and
here. Make sure that you read the comments on these blog posts
to get the full flavor of the debate. If you're looking for a blog post by a
.NET legendary figure that ends the debate,
look no
further.)
It turns out that the answer is 1, not 0. The reason for this is that the
anonymous method is a closure and is bound to its parenting method body and the
local variables in it. The important distinction is that it is bound to
variables, not to values. In other words, the value of "x" is not copied in when
"a" is declared. Instead, a reference to "x" is used so that "a" will always use
the most recent value of "x". In fact, this reference to "x" will be persisted
even if "x" goes out of scope. Consider this code:
delegate void
Action();
static Action GetAction()
{
int x = 0;
Action a = delegate { Console.WriteLine(x);
};
x = 1;
return a;
}
static void Main(string[]
args)
{
Action a = GetAction();
a();
}
That will still print 1 to the console even though "x" is out of scope by the
time that "a" is called. So, how is this achieved? Well, the good news is that
this is handled through compiler magic. There isn't any runtime support for
closures. That means that you could use the same techniques to create a closure
without using an anonymous method. In fact, it is
so simple that you could even do it with C# 1.0.
How's It Work?
I hinted at how closures work earlier when said "a reference to 'x' is used
so that 'a' will always use the most recent value of 'x'". The key here is that
a reference is used and not a value. The variable "x" is promoted from the stack to the heap in
some way. And that promotion is made in such a way that the scope of "x" can be
increased from its local scope. Oh, and this is done without boxing "x" to a
reference type.
To make all of this possible, the C# compiler generates a special helper
class. "x" becomes a field of that class and the anonymous method assigned to
"a" becomes an instance method of that class. In code, it looks something like
this:
delegate void
Action();
sealed class
ActionClosure
{
public int x;
public void
AnonMethod()
{
Console.WriteLine(x);
}
}
static Action GetAction()
{
ActionClosure closure = new ActionClosure();
closure.x = 0;
Action a = new Action(closure.AnonMethod);
closure.x = 1;
return a;
}
static void Main(string[]
args)
{
Action a = GetAction();
a();
}
The "GetAction" method is really where the magic happens:
- At the beginning of method, an instance of the "ActionClosure" class is
created. (Note that I chose to use the names "ActionClosure" and "AnonMethod"
for clarity. In reality, the compiler generates names to prevent name
collision.)
- All references to the local variable "x" in the "GetAction" method have
been replaced with references to the "x" field on the "ActionClosure"
instance.
- The delegate "a" is now assigned to a new delegate instance for "AnonMethod"
on the "ActionClosure" instance.
The cool thing to me is that this code will compile under C# 1.0 and work the
same as the code using an anonymous method. It's all compiler magic and
completely transparent to a C# 2.0 programmer.
Why Do I Care?
The biggest sell for closures is that they can dramatically improve code and
make it more robust. Many developers have voiced a mistrust of C# 2.0 anonymous
methods because of their potential for abuse. After all, it's not too hard to
imagine anonymous methods turning quickly into spaghetti code. But in my
experience, many developers have dismissed them simply because they don't
understand the power behind them: closures.
Let's look at an example:
static List<string>
g_Names = new List<string>(
new string[] {
"Bill Gates",
"Dustin Campbell",
"Dustin's Trophy Wife",
"Foster the Dog"
});
static void
Print(List<string> items)
{
foreach (string
item in items)
Console.WriteLine(item);
}
static void Main(string[]
args)
{
Print(g_Names);
}
This application simply creates a list of names and then outputs them to the
console. It works perfectly well.
Now, let's assume that we need to add the ability to retrieve a list of names
that starts with some particular text. This is pretty easy to implement because
there is a handy method on List<T> called FindAll that simply takes a Predicate
delegate and produces a new List<T> containing all of the items that the
Predicate returns true for. We can add this new function like so:
static string
g_StartingText;
static bool
NameStartsWith(string name)
{
return name.StartsWith(g_StartingText);
}
static List<string>
GetNamesStartingWith(string startingText)
{
g_StartingText = startingText;
return g_Names.FindAll(NameStartsWith);
}
Everything is working fine until our client calls and says that there is a
new requirement for this function: it must be thread-safe. In other words, the
function must produce valid results even if it is called by multiple threads.
This is problematic because, while one thread is finding all names starting with
"D", another thread could change "g_StartingText" to something else and bad
results would be returned.
One possibility might be tempting to place a lock on "g_StartingText". This
would certainly make the function thread-safe but it has some drawbacks. The
biggest issue with this approach is that threads will not be able to access this
function concurrently. If a thread acquires the lock, all other threads must
wait until that thread is finished. In other words, this method becomes a
potential bottleneck because only one thread can access it at a time and if
there any additional processors on the machine they won't be used.
The solution is to use an anonymous method to create a closure and remove the
shared state:
static List<string>
GetNamesStartingWith(string startingText)
{
return g_Names.FindAll(
delegate(string
name)
{
return
name.StartsWith(startingText);
});
}
Even with the verbosity of anonymous methods, the code
has been greatly reduced. And, assuming that "g_Names" will never be modified,
this function could run concurrently on multiple threads and multiple cores
without any synchronization.
Admittedly, my example is highly-contrived but it's not too hard to imagine
the same situation in larger scale applications.
Closures are critical to functional programming. Without closures, several techniques like currying and
memoization (more on
these in future articles) don't work. And, without understanding the subtleties
of closures, you won't be able to use the features of C# 3.0 properly. I don't
need to defend the value of functional programming in C# because others have done that
extremely
well.
We will be building on this knowledge in future articles so understanding
closures is important. Things will quickly start to make your head spin if you
don't understand this concept.
Until next time...
Previous: Fibonacci Numbers, Caching and Closures - Next: Using Automatic Memoization
