Tuesday, September 25, 2007
UPDATE (Sep. 28, 2007): This article is really academic in nature on the topic of closures and how they fit into LINQ query expressions. It contains a highly-contrived example that is not representative of quality LINQ code. For more information, take a look at this post.

To me, one of the most interesting aspects of LINQ query expressions is that they produce lexical closures (for a detailed look at closures in C#, see my article on the topic). To illustrate this point, consider the following code:

static void Main()
{
  var filter = "Compare";
 
  var query = from m in typeof(String).GetMethods()
              where m.Name.Contains(filter)
              select new { m.Name, ParameterCount = m.GetParameters().Length };
 
  foreach (var item in query)
    Console.WriteLine(item);
 
  Console.WriteLine();
  Console.WriteLine("--- press any key to continue ---");
  Console.ReadKey();
}

This query retrieves all of the public methods on System.String whose names contain the text represented by the "filter" variable (in this case "Compare"). When compiled and run, the output is what you might guess:

{ Name = Compare, ParameterCount = 2 }
{ Name = Compare, ParameterCount = 3 }
{ Name = Compare, ParameterCount = 3 }
{ Name = Compare, ParameterCount = 4 }
{ Name = Compare, ParameterCount = 5 }
{ Name = Compare, ParameterCount = 6 }
{ Name = Compare, ParameterCount = 7 }
{ Name = Compare, ParameterCount = 6 }
{ Name = CompareTo, ParameterCount = 1 }
{ Name = CompareTo, ParameterCount = 1 }
{ Name = CompareOrdinal, ParameterCount = 2 }
{ Name = CompareOrdinal, ParameterCount = 5 }

--- press any key to continue ---

That behavior should be perfectly natural to any C# developer. Here's where things get a little tricky:

var filter = "Compare";

var query = from m in typeof(String).GetMethods()
            where m.Name.Contains(filter)
            select new { m.Name, ParameterCount = m.GetParameters().Length };

filter = "IndexOf";

foreach (var item in query)
  Console.WriteLine(item);

Can you guess what that code will output to the console?

Your answer to that question depends on your understanding of what closures are and how they work. A closure is produced when a variable whose scope extends beyond the current lexical block is bound to that block. That's a bit of a mouthful, isn't it? Allow me to clarify what I mean with a simple example.

delegate void Action();
 
static void Main()
{
  int x = 0;
 
  Action a = delegate { Console.WriteLine(x); };
 
  x = 1;
 
  a();
}

In the code above, an anonymous delegate ("a") references a variable ("x") that is declared outside of the anonymous delegate's body. This implies a lexical closure, and the variable "x" is bound to the method body of "a." The important point is that "a" is bound to the variable "x" and not its value. In other words, the value that "a" writes to the console depends upon the value of "x" at the time of its execution. Because 1 is assigned to "x" immediately before "a" is executed, 1 is output to the console.

Precisely the same thing happens in our query expression. A closure is produced for the lambda expression of the "where" clause because it references the "filter" variable, which is declared outside of the query expression. The closure binds to the variable "filter"—not its value. So, changing the value of "filter" after the query expression is defined will change the results returned by the query. In fact, if you run that code, you'll get this:

{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOfAny, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOfAny, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 1 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOfAny, ParameterCount = 1 }
{ Name = IndexOfAny, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 1 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 4 }
{ Name = LastIndexOf, ParameterCount = 1 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOfAny, ParameterCount = 1 }
{ Name = LastIndexOfAny, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 1 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 4 }

--- press any key to continue ---

Let's try to exploit this closure in a more practical way.

var filter = String.Empty;

var query = from m in typeof(String).GetMethods()
            where m.Name != filter
            select m.Name;

foreach (var item in query)
{
  Console.WriteLine(item);
  filter = item;
}

This slightly different query expression returns the names of all of the public methods on System.String that don't match the value of the variable "filter." By modifying "filter" in each iteration of the foreach loop, we are effectively filtering out all duplicate method names. This works as advertised, but there's one potential bug: it is assumed that all overloads of a method are grouped together. If there are overloads of, say, String.CompareTo that aren't adjacent in the source array, the filtering won't work properly. What we really need to do is sort the array using the "orderby" query operator.

var query = from m in typeof(String).GetMethods()
            where m.Name != filter
            orderby m.Name
            select m.Name;

WHOOPS! That doesn't work. When we execute that query, all of the method names are output to the console, including duplicates. Our modifications to the "filter" variable in the foreach loop are completely ignored. Why is that?

The reason is that "orderby" forces the entire query to be evaluated when the first element is requested. This behavior is unavoidable and breaks the normal delayed evaluation of a query expression. However, we can still make the closure work properly by ensuring that the sort happens before filtering.

var query = from m in typeof(String).GetMethods()
            orderby m.Name
            where m.Name != filter
            select m.Name;

Now we get the output that we want:

Clone
Compare
CompareOrdinal
CompareTo
Concat
Contains
Copy
CopyTo
EndsWith
Equals
Format
get_Chars
get_Length
GetEnumerator
GetHashCode
GetType
GetTypeCode
IndexOf
IndexOfAny
Insert
Intern
IsInterned
IsNormalized
IsNullOrEmpty
Join
LastIndexOf
LastIndexOfAny
Normalize
op_Equality
op_Inequality
PadLeft
PadRight
Remove
Replace
Split
StartsWith
Substring
ToCharArray
ToLower
ToLowerInvariant
ToString
ToUpper
ToUpperInvariant
Trim
TrimEnd
TrimStart

--- press any key to continue ---

The moral here is to be careful. Exploiting closures in query expressions can be powerful but tricky to get right.

posted on Tuesday, September 25, 2007 6:53:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]

kick it on DotNetKicks.com