Recently, I presented an example of how closures
can cause headaches when used in the context of LINQ expressions:
static class Program
{
static void Main()
{
var filter = String.Empty;
var query = from m in typeof(String).GetMethods()
orderby m.Name
where m.Name != filter
select m.Name;
foreach (var item in query)
{
Console.WriteLine(item);
filter = item;
}
}
}
I want to state clearly that the example above is academic and
not representative of how anybody should be writing LINQ code. This was
implied by the intentionally-alarmist title of my post ("LINQ Closures May Be Hazardous to
Your Health!"), but some readers missed the point. Let's take a closer look at
what, exactly, is wrong with this LINQ code and how it should be properly
written.
First of all, the example code exhibits several nasty smells:
- It isn't portable. There's no guarantee that other LINQ providers will
actually support closures.
- It isn't maintainable. The code is an obvious maintenance
headache—especially if it will be handled by more than one person.
- It isn't declarative. LINQ syntax is designed to be declarative, but
this code mixes in imperative logic.
- It isn't flexible. The closure voodoo inhibits potential optimizations
that might occur with future technologies like Parallel LINQ.
In addition to the negative consequences listed above, the closure exploited
in the sample is
completely unnecessary! Closures can sometimes be powerful, but this isn't really
the place to exploit them. In fact, writing code like that betrays a lack of
knowledge of LINQ's
standard query operators.
LINQ already provides an easy way to ensure that duplicate values are removed
from a query expression: the Distinct operator.
Distinct is one of several query operators designed to perform
set
operations on queries (the other operators are Except, Intersect and Union).
Using Distinct in place of the closure solves all of the afore-mentioned
problems and, as a bonus, makes the code more concise.
static void Main()
{
var query = (from m in typeof(String).GetMethods()
orderby m.Name
select m.Name).Distinct();
foreach (var item in query)
Console.WriteLine(item);
}
Not surprisingly, Visual Basic takes this a step further by adding a new
"Distinct" keyword.
Sub Main()
Dim query = From m In GetType(String).GetMethods() _
Order By m.Name _
Select m.Name _
Distinct
For Each item In query
Console.WriteLine(item)
Next
End Sub
However, this new keyword only gives me a mild case of VB Envy. I really like
the fact that Distinct is syntax highlighted, but I much prefer how the
parentheses better delineate the query expression in the C# version.
I hope this clears up any confusion that my other post might have caused.
LINQ syntax is designed to be simply and declarative. Don't let your code get
too fancy, and you'll reap the benefits of LINQ.