Friday, September 28, 2007

Recently, I presented an example of how closures can cause headaches when used in the context of LINQ expressions:

static class Program
{
  static void Main()
  {
    var filter = String.Empty;
 
    var query = from m in typeof(String).GetMethods()
                orderby m.Name
                where m.Name != filter
                select m.Name;
 
    foreach (var item in query)
    {
      Console.WriteLine(item);
      filter = item;
    }
  }
}

I want to state clearly that the example above is academic and not representative of how anybody should be writing LINQ code. This was implied by the intentionally-alarmist title of my post ("LINQ Closures May Be Hazardous to Your Health!"), but some readers missed the point. Let's take a closer look at what, exactly, is wrong with this LINQ code and how it should be properly written.

First of all, the example code exhibits several nasty smells:

  1. It isn't portable. There's no guarantee that other LINQ providers will actually support closures.
  2. It isn't maintainable. The code is an obvious maintenance headache—especially if it will be handled by more than one person.
  3. It isn't declarative. LINQ syntax is designed to be declarative, but this code mixes in imperative logic.
  4. It isn't flexible. The closure voodoo inhibits potential optimizations that might occur with future technologies like Parallel LINQ.

In addition to the negative consequences listed above, the closure exploited in the sample is completely unnecessary! Closures can sometimes be powerful, but this isn't really the place to exploit them. In fact, writing code like that betrays a lack of knowledge of LINQ's standard query operators. LINQ already provides an easy way to ensure that duplicate values are removed from a query expression: the Distinct operator.

Distinct is one of several query operators designed to perform set operations on queries (the other operators are Except, Intersect and Union). Using Distinct in place of the closure solves all of the afore-mentioned problems and, as a bonus, makes the code more concise.

static void Main()
{
  var query = (from m in typeof(String).GetMethods()
              orderby m.Name
              select m.Name).Distinct();
 
  foreach (var item in query)
    Console.WriteLine(item);
}

Not surprisingly, Visual Basic takes this a step further by adding a new "Distinct" keyword.

Sub Main()
  Dim query = From m In GetType(String).GetMethods() _
              Order By m.Name _
              Select m.Name _
              Distinct
 
  For Each item In query
    Console.WriteLine(item)
  Next
End Sub

However, this new keyword only gives me a mild case of VB Envy. I really like the fact that Distinct is syntax highlighted, but I much prefer how the parentheses better delineate the query expression in the C# version.

I hope this clears up any confusion that my other post might have caused. LINQ syntax is designed to be simply and declarative. Don't let your code get too fancy, and you'll reap the benefits of LINQ.

posted on Friday, September 28, 2007 9:47:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Tuesday, September 25, 2007
UPDATE (Sep. 28, 2007): This article is really academic in nature on the topic of closures and how they fit into LINQ query expressions. It contains a highly-contrived example that is not representative of quality LINQ code. For more information, take a look at this post.

To me, one of the most interesting aspects of LINQ query expressions is that they produce lexical closures (for a detailed look at closures in C#, see my article on the topic). To illustrate this point, consider the following code:

static void Main()
{
  var filter = "Compare";
 
  var query = from m in typeof(String).GetMethods()
              where m.Name.Contains(filter)
              select new { m.Name, ParameterCount = m.GetParameters().Length };
 
  foreach (var item in query)
    Console.WriteLine(item);
 
  Console.WriteLine();
  Console.WriteLine("--- press any key to continue ---");
  Console.ReadKey();
}

This query retrieves all of the public methods on System.String whose names contain the text represented by the "filter" variable (in this case "Compare"). When compiled and run, the output is what you might guess:

{ Name = Compare, ParameterCount = 2 }
{ Name = Compare, ParameterCount = 3 }
{ Name = Compare, ParameterCount = 3 }
{ Name = Compare, ParameterCount = 4 }
{ Name = Compare, ParameterCount = 5 }
{ Name = Compare, ParameterCount = 6 }
{ Name = Compare, ParameterCount = 7 }
{ Name = Compare, ParameterCount = 6 }
{ Name = CompareTo, ParameterCount = 1 }
{ Name = CompareTo, ParameterCount = 1 }
{ Name = CompareOrdinal, ParameterCount = 2 }
{ Name = CompareOrdinal, ParameterCount = 5 }

--- press any key to continue ---

That behavior should be perfectly natural to any C# developer. Here's where things get a little tricky:

var filter = "Compare";

var query = from m in typeof(String).GetMethods()
            where m.Name.Contains(filter)
            select new { m.Name, ParameterCount = m.GetParameters().Length };

filter = "IndexOf";

foreach (var item in query)
  Console.WriteLine(item);

Can you guess what that code will output to the console?

Your answer to that question depends on your understanding of what closures are and how they work. A closure is produced when a variable whose scope extends beyond the current lexical block is bound to that block. That's a bit of a mouthful, isn't it? Allow me to clarify what I mean with a simple example.

delegate void Action();
 
static void Main()
{
  int x = 0;
 
  Action a = delegate { Console.WriteLine(x); };
 
  x = 1;
 
  a();
}

In the code above, an anonymous delegate ("a") references a variable ("x") that is declared outside of the anonymous delegate's body. This implies a lexical closure, and the variable "x" is bound to the method body of "a." The important point is that "a" is bound to the variable "x" and not its value. In other words, the value that "a" writes to the console depends upon the value of "x" at the time of its execution. Because 1 is assigned to "x" immediately before "a" is executed, 1 is output to the console.

Precisely the same thing happens in our query expression. A closure is produced for the lambda expression of the "where" clause because it references the "filter" variable, which is declared outside of the query expression. The closure binds to the variable "filter"—not its value. So, changing the value of "filter" after the query expression is defined will change the results returned by the query. In fact, if you run that code, you'll get this:

{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOfAny, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOfAny, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 1 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOfAny, ParameterCount = 1 }
{ Name = IndexOfAny, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 1 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 4 }
{ Name = LastIndexOf, ParameterCount = 1 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOfAny, ParameterCount = 1 }
{ Name = LastIndexOfAny, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 1 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 4 }

--- press any key to continue ---

Let's try to exploit this closure in a more practical way.

var filter = String.Empty;

var query = from m in typeof(String).GetMethods()
            where m.Name != filter
            select m.Name;

foreach (var item in query)
{
  Console.WriteLine(item);
  filter = item;
}

This slightly different query expression returns the names of all of the public methods on System.String that don't match the value of the variable "filter." By modifying "filter" in each iteration of the foreach loop, we are effectively filtering out all duplicate method names. This works as advertised, but there's one potential bug: it is assumed that all overloads of a method are grouped together. If there are overloads of, say, String.CompareTo that aren't adjacent in the source array, the filtering won't work properly. What we really need to do is sort the array using the "orderby" query operator.

var query = from m in typeof(String).GetMethods()
            where m.Name != filter
            orderby m.Name
            select m.Name;

WHOOPS! That doesn't work. When we execute that query, all of the method names are output to the console, including duplicates. Our modifications to the "filter" variable in the foreach loop are completely ignored. Why is that?

The reason is that "orderby" forces the entire query to be evaluated when the first element is requested. This behavior is unavoidable and breaks the normal delayed evaluation of a query expression. However, we can still make the closure work properly by ensuring that the sort happens before filtering.

var query = from m in typeof(String).GetMethods()
            orderby m.Name
            where m.Name != filter
            select m.Name;

Now we get the output that we want:

Clone
Compare
CompareOrdinal
CompareTo
Concat
Contains
Copy
CopyTo
EndsWith
Equals
Format
get_Chars
get_Length
GetEnumerator
GetHashCode
GetType
GetTypeCode
IndexOf
IndexOfAny
Insert
Intern
IsInterned
IsNormalized
IsNullOrEmpty
Join
LastIndexOf
LastIndexOfAny
Normalize
op_Equality
op_Inequality
PadLeft
PadRight
Remove
Replace
Split
StartsWith
Substring
ToCharArray
ToLower
ToLowerInvariant
ToString
ToUpper
ToUpperInvariant
Trim
TrimEnd
TrimStart

--- press any key to continue ---

The moral here is to be careful. Exploiting closures in query expressions can be powerful but tricky to get right.

posted on Tuesday, September 25, 2007 6:53:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]

kick it on DotNetKicks.com
 Monday, September 24, 2007
Recently, I've been using Lutz Roeder's indispensible .NET Reflector to explore how C# 3.0 LINQ query expressions are compiled. To make this easy, the .NET Reflector supports a useful "optimization" setting that specifies which version of the .NET Framework the disassembler should draw features from for code generation. Changing the setting is pretty easy. Just select "Options..." from the "View" menu to display the Options dialog. Then, modify the "Optimization" value on the "Disassembler" page.

Reflector options for optimizing disassembled code to a specific .NET Framework

With the disassembly optimization set to .NET Framework 3.5, here's how a simple query expression looks:

LINQ code disassembled with Reflector and optimized for .NET 3.5

That's pretty cool, but it doesn't really give any insight into the compiler magic happening under the hood. To get a better picture of this, the optimization setting should be changed to ".NET 2.0." Once this is done, the disassembler no longer generates query syntax, and it uses anonymous methods. This makes it plain to see which extension methods are compiled for the different clauses of a query expression. In addition, the method calls are hyperlinked, making it easy to dig deeper.

LINQ code disassembled with Reflector and optimized for .NET 2.0

While this is all very helpful, I do have a few complaints:

  1. I should be able to change the disassembler options on the fly. It'd be great if the Disassembler window sported a toolbar for modifying its options. The current user experience requires me to open the options dialog, make the change, click OK and wait while the .NET Reflector unloads and reloads all of the assemblies that are open. In fact, if I open the options dialog, make no changes and click OK, Reflector will still unload and reload everything. At the risk of inviting comment abuse from Reflector devotees1, I have to say that this strikes me as a pretty lame UI cop out.
  2. The .NET 2.0 optimization isn't accurate because it generates syntax for extension methods. I'm a bit torn by this because this inaccuracy actually makes it easier to understand the code. If this is changed/fixed, there should be an additional option that hides query syntax and shows the underlying method calls with lambda expressions instead of anonymous methods. That way, Reflector could display this LINQ expression:
var query = from m in typeof(String).GetMethods()
            orderby m.Name
            select m.Name;

Like this:

var query = typeof(String).GetMethods().OrderBy(m => m.Name).Select(m => m.Name);

Regardless of these issues, which I hope are addressed (are you reading this, Roeder?!?), the .NET Reflector is a life-changing tool. If it isn't already a part of your developer's toolbox, you should go download it right now.

1I'm one of them.

posted on Monday, September 24, 2007 9:46:09 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Friday, August 17, 2007
Several weeks ago, I posted this bit of code that shows how we might use a C# 3.0 query expression to calculate the sum of the squares of an array of integers.
static void Main()
{
  var numbers = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

  var
sum = (from n in numbers
             where (n % 2) == 0
             select n * n).Sum();

  Console
.WriteLine("Sum: {0}", sum);
}

Translating this sample into Visual Basic 9.0 produces almost identical code.

Sub Main()
  Dim numbers() = New Integer() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

  Dim
total = (From n In numbers _
               Where (n Mod 2) = 0 _
               Select n * n).Sum()

  Console.WriteLine("Sum: {0}", total)
End Sub

However, this translation is a bit naive because Visual Basic 9.0 actually provides syntax for more of the standard query operators than C# 3.0 does. While we have to call the "Sum" query operator explicitly in C#, Visual Basic allows us to use it directly in the query.

Sub Main()
  Dim numbers() = New Integer() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

  Dim
total = Aggregate n In numbers _
              Where (n Mod 2) = 0 _
              Select n * n _
              Into Sum()

  Console.WriteLine("Sum: {0}", total)
End Sub

In fact, Visual Basic even allows us to create our own aggregate functions and use them directly in query expressions.

<Extension()> _
Function Product(ByVal source As IEnumerable(Of Integer)) As Integer
  Dim
result = 1
  For Each n In source
    result *= n
  Next
  Return
result
End Function

Sub
Main()
  Dim numbers() = New Integer() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

  Dim total = Aggregate n In numbers _
              Where (n Mod 2) = 0 _
              Select n _
              Into Product()

  Console.WriteLine("Sum: {0}", total)
End Sub

Here we get the product of the even numbers in the array. (I removed the expression to square each even number because it produced an OverflowException.)

I should point out that there is a behavioral difference that the Visual Basic "Aggregate" keyword introduces. A standard "From" query expression is delay evaluated. That is, the results aren't actually evaluated until they are accessed through, say, a "For Each" loop. However, an "Aggregate" query expression forces the results to be evaluated immediately. In contrast, C# 3.0 query expressions always produce results that are delay evaluated.1

1A bold statement that will be completely recanted if any reader can find an example that proves otherwise.2
2Please, prove me wrong. Seriously. I'm interested in this stuff.3
3This footnote motif is clearly ripped off from Raymond Chen.

posted on Friday, August 17, 2007 7:46:31 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]

kick it on DotNetKicks.com
 Wednesday, June 27, 2007
This time, I briefly look at how to use methods available to C# 3.0 that are equivalent to Filter, Map and Reduce.
posted on Wednesday, June 27, 2007 11:54:58 AM (Pacific Standard Time, UTC-08:00)  #    Comments [6]

kick it on DotNetKicks.com