Friday, September 28, 2007

Recently, I presented an example of how closures can cause headaches when used in the context of LINQ expressions:

static class Program
{
  static void Main()
  {
    var filter = String.Empty;
 
    var query = from m in typeof(String).GetMethods()
                orderby m.Name
                where m.Name != filter
                select m.Name;
 
    foreach (var item in query)
    {
      Console.WriteLine(item);
      filter = item;
    }
  }
}

I want to state clearly that the example above is academic and not representative of how anybody should be writing LINQ code. This was implied by the intentionally-alarmist title of my post ("LINQ Closures May Be Hazardous to Your Health!"), but some readers missed the point. Let's take a closer look at what, exactly, is wrong with this LINQ code and how it should be properly written.

First of all, the example code exhibits several nasty smells:

  1. It isn't portable. There's no guarantee that other LINQ providers will actually support closures.
  2. It isn't maintainable. The code is an obvious maintenance headache—especially if it will be handled by more than one person.
  3. It isn't declarative. LINQ syntax is designed to be declarative, but this code mixes in imperative logic.
  4. It isn't flexible. The closure voodoo inhibits potential optimizations that might occur with future technologies like Parallel LINQ.

In addition to the negative consequences listed above, the closure exploited in the sample is completely unnecessary! Closures can sometimes be powerful, but this isn't really the place to exploit them. In fact, writing code like that betrays a lack of knowledge of LINQ's standard query operators. LINQ already provides an easy way to ensure that duplicate values are removed from a query expression: the Distinct operator.

Distinct is one of several query operators designed to perform set operations on queries (the other operators are Except, Intersect and Union). Using Distinct in place of the closure solves all of the afore-mentioned problems and, as a bonus, makes the code more concise.

static void Main()
{
  var query = (from m in typeof(String).GetMethods()
              orderby m.Name
              select m.Name).Distinct();
 
  foreach (var item in query)
    Console.WriteLine(item);
}

Not surprisingly, Visual Basic takes this a step further by adding a new "Distinct" keyword.

Sub Main()
  Dim query = From m In GetType(String).GetMethods() _
              Order By m.Name _
              Select m.Name _
              Distinct
 
  For Each item In query
    Console.WriteLine(item)
  Next
End Sub

However, this new keyword only gives me a mild case of VB Envy. I really like the fact that Distinct is syntax highlighted, but I much prefer how the parentheses better delineate the query expression in the C# version.

I hope this clears up any confusion that my other post might have caused. LINQ syntax is designed to be simply and declarative. Don't let your code get too fancy, and you'll reap the benefits of LINQ.

posted on Friday, September 28, 2007 9:47:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com