Friday, September 28, 2007

Recently, I presented an example of how closures can cause headaches when used in the context of LINQ expressions:

static class Program
{
  static void Main()
  {
    var filter = String.Empty;
 
    var query = from m in typeof(String).GetMethods()
                orderby m.Name
                where m.Name != filter
                select m.Name;
 
    foreach (var item in query)
    {
      Console.WriteLine(item);
      filter = item;
    }
  }
}

I want to state clearly that the example above is academic and not representative of how anybody should be writing LINQ code. This was implied by the intentionally-alarmist title of my post ("LINQ Closures May Be Hazardous to Your Health!"), but some readers missed the point. Let's take a closer look at what, exactly, is wrong with this LINQ code and how it should be properly written.

First of all, the example code exhibits several nasty smells:

  1. It isn't portable. There's no guarantee that other LINQ providers will actually support closures.
  2. It isn't maintainable. The code is an obvious maintenance headache—especially if it will be handled by more than one person.
  3. It isn't declarative. LINQ syntax is designed to be declarative, but this code mixes in imperative logic.
  4. It isn't flexible. The closure voodoo inhibits potential optimizations that might occur with future technologies like Parallel LINQ.

In addition to the negative consequences listed above, the closure exploited in the sample is completely unnecessary! Closures can sometimes be powerful, but this isn't really the place to exploit them. In fact, writing code like that betrays a lack of knowledge of LINQ's standard query operators. LINQ already provides an easy way to ensure that duplicate values are removed from a query expression: the Distinct operator.

Distinct is one of several query operators designed to perform set operations on queries (the other operators are Except, Intersect and Union). Using Distinct in place of the closure solves all of the afore-mentioned problems and, as a bonus, makes the code more concise.

static void Main()
{
  var query = (from m in typeof(String).GetMethods()
              orderby m.Name
              select m.Name).Distinct();
 
  foreach (var item in query)
    Console.WriteLine(item);
}

Not surprisingly, Visual Basic takes this a step further by adding a new "Distinct" keyword.

Sub Main()
  Dim query = From m In GetType(String).GetMethods() _
              Order By m.Name _
              Select m.Name _
              Distinct
 
  For Each item In query
    Console.WriteLine(item)
  Next
End Sub

However, this new keyword only gives me a mild case of VB Envy. I really like the fact that Distinct is syntax highlighted, but I much prefer how the parentheses better delineate the query expression in the C# version.

I hope this clears up any confusion that my other post might have caused. LINQ syntax is designed to be simply and declarative. Don't let your code get too fancy, and you'll reap the benefits of LINQ.

posted on Friday, September 28, 2007 9:47:20 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Tuesday, September 25, 2007
Jomo Fisher is a C# team member who works on LINQ to SQL. Recently, Jomo has been exploring F# by compiling bits of code and seeing what the F# compiler generates. So far, I've found this "Adventures in F#" series to be quite enjoyable, and I recommend it to any interested readers. Here is a list of the articles posted to date:
I should point out that this series is not an introduction to functional programming or F#. Instead, it is more technical in nature and asks the question, "How does the compiler do that?"
posted on Tuesday, September 25, 2007 7:54:10 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
UPDATE (Sep. 28, 2007): This article is really academic in nature on the topic of closures and how they fit into LINQ query expressions. It contains a highly-contrived example that is not representative of quality LINQ code. For more information, take a look at this post.

To me, one of the most interesting aspects of LINQ query expressions is that they produce lexical closures (for a detailed look at closures in C#, see my article on the topic). To illustrate this point, consider the following code:

static void Main()
{
  var filter = "Compare";
 
  var query = from m in typeof(String).GetMethods()
              where m.Name.Contains(filter)
              select new { m.Name, ParameterCount = m.GetParameters().Length };
 
  foreach (var item in query)
    Console.WriteLine(item);
 
  Console.WriteLine();
  Console.WriteLine("--- press any key to continue ---");
  Console.ReadKey();
}

This query retrieves all of the public methods on System.String whose names contain the text represented by the "filter" variable (in this case "Compare"). When compiled and run, the output is what you might guess:

{ Name = Compare, ParameterCount = 2 }
{ Name = Compare, ParameterCount = 3 }
{ Name = Compare, ParameterCount = 3 }
{ Name = Compare, ParameterCount = 4 }
{ Name = Compare, ParameterCount = 5 }
{ Name = Compare, ParameterCount = 6 }
{ Name = Compare, ParameterCount = 7 }
{ Name = Compare, ParameterCount = 6 }
{ Name = CompareTo, ParameterCount = 1 }
{ Name = CompareTo, ParameterCount = 1 }
{ Name = CompareOrdinal, ParameterCount = 2 }
{ Name = CompareOrdinal, ParameterCount = 5 }

--- press any key to continue ---

That behavior should be perfectly natural to any C# developer. Here's where things get a little tricky:

var filter = "Compare";

var query = from m in typeof(String).GetMethods()
            where m.Name.Contains(filter)
            select new { m.Name, ParameterCount = m.GetParameters().Length };

filter = "IndexOf";

foreach (var item in query)
  Console.WriteLine(item);

Can you guess what that code will output to the console?

Your answer to that question depends on your understanding of what closures are and how they work. A closure is produced when a variable whose scope extends beyond the current lexical block is bound to that block. That's a bit of a mouthful, isn't it? Allow me to clarify what I mean with a simple example.

delegate void Action();
 
static void Main()
{
  int x = 0;
 
  Action a = delegate { Console.WriteLine(x); };
 
  x = 1;
 
  a();
}

In the code above, an anonymous delegate ("a") references a variable ("x") that is declared outside of the anonymous delegate's body. This implies a lexical closure, and the variable "x" is bound to the method body of "a." The important point is that "a" is bound to the variable "x" and not its value. In other words, the value that "a" writes to the console depends upon the value of "x" at the time of its execution. Because 1 is assigned to "x" immediately before "a" is executed, 1 is output to the console.

Precisely the same thing happens in our query expression. A closure is produced for the lambda expression of the "where" clause because it references the "filter" variable, which is declared outside of the query expression. The closure binds to the variable "filter"—not its value. So, changing the value of "filter" after the query expression is defined will change the results returned by the query. In fact, if you run that code, you'll get this:

{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOfAny, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOfAny, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 1 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOfAny, ParameterCount = 1 }
{ Name = IndexOfAny, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 1 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 2 }
{ Name = IndexOf, ParameterCount = 3 }
{ Name = IndexOf, ParameterCount = 4 }
{ Name = LastIndexOf, ParameterCount = 1 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOfAny, ParameterCount = 1 }
{ Name = LastIndexOfAny, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 1 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 2 }
{ Name = LastIndexOf, ParameterCount = 3 }
{ Name = LastIndexOf, ParameterCount = 4 }

--- press any key to continue ---

Let's try to exploit this closure in a more practical way.

var filter = String.Empty;

var query = from m in typeof(String).GetMethods()
            where m.Name != filter
            select m.Name;

foreach (var item in query)
{
  Console.WriteLine(item);
  filter = item;
}

This slightly different query expression returns the names of all of the public methods on System.String that don't match the value of the variable "filter." By modifying "filter" in each iteration of the foreach loop, we are effectively filtering out all duplicate method names. This works as advertised, but there's one potential bug: it is assumed that all overloads of a method are grouped together. If there are overloads of, say, String.CompareTo that aren't adjacent in the source array, the filtering won't work properly. What we really need to do is sort the array using the "orderby" query operator.

var query = from m in typeof(String).GetMethods()
            where m.Name != filter
            orderby m.Name
            select m.Name;

WHOOPS! That doesn't work. When we execute that query, all of the method names are output to the console, including duplicates. Our modifications to the "filter" variable in the foreach loop are completely ignored. Why is that?

The reason is that "orderby" forces the entire query to be evaluated when the first element is requested. This behavior is unavoidable and breaks the normal delayed evaluation of a query expression. However, we can still make the closure work properly by ensuring that the sort happens before filtering.

var query = from m in typeof(String).GetMethods()
            orderby m.Name
            where m.Name != filter
            select m.Name;

Now we get the output that we want:

Clone
Compare
CompareOrdinal
CompareTo
Concat
Contains
Copy
CopyTo
EndsWith
Equals
Format
get_Chars
get_Length
GetEnumerator
GetHashCode
GetType
GetTypeCode
IndexOf
IndexOfAny
Insert
Intern
IsInterned
IsNormalized
IsNullOrEmpty
Join
LastIndexOf
LastIndexOfAny
Normalize
op_Equality
op_Inequality
PadLeft
PadRight
Remove
Replace
Split
StartsWith
Substring
ToCharArray
ToLower
ToLowerInvariant
ToString
ToUpper
ToUpperInvariant
Trim
TrimEnd
TrimStart

--- press any key to continue ---

The moral here is to be careful. Exploiting closures in query expressions can be powerful but tricky to get right.

posted on Tuesday, September 25, 2007 6:53:23 AM (Pacific Standard Time, UTC-08:00)  #    Comments [1]

kick it on DotNetKicks.com
 Monday, September 24, 2007
Recently, I've been using Lutz Roeder's indispensible .NET Reflector to explore how C# 3.0 LINQ query expressions are compiled. To make this easy, the .NET Reflector supports a useful "optimization" setting that specifies which version of the .NET Framework the disassembler should draw features from for code generation. Changing the setting is pretty easy. Just select "Options..." from the "View" menu to display the Options dialog. Then, modify the "Optimization" value on the "Disassembler" page.

Reflector options for optimizing disassembled code to a specific .NET Framework

With the disassembly optimization set to .NET Framework 3.5, here's how a simple query expression looks:

LINQ code disassembled with Reflector and optimized for .NET 3.5

That's pretty cool, but it doesn't really give any insight into the compiler magic happening under the hood. To get a better picture of this, the optimization setting should be changed to ".NET 2.0." Once this is done, the disassembler no longer generates query syntax, and it uses anonymous methods. This makes it plain to see which extension methods are compiled for the different clauses of a query expression. In addition, the method calls are hyperlinked, making it easy to dig deeper.

LINQ code disassembled with Reflector and optimized for .NET 2.0

While this is all very helpful, I do have a few complaints:

  1. I should be able to change the disassembler options on the fly. It'd be great if the Disassembler window sported a toolbar for modifying its options. The current user experience requires me to open the options dialog, make the change, click OK and wait while the .NET Reflector unloads and reloads all of the assemblies that are open. In fact, if I open the options dialog, make no changes and click OK, Reflector will still unload and reload everything. At the risk of inviting comment abuse from Reflector devotees1, I have to say that this strikes me as a pretty lame UI cop out.
  2. The .NET 2.0 optimization isn't accurate because it generates syntax for extension methods. I'm a bit torn by this because this inaccuracy actually makes it easier to understand the code. If this is changed/fixed, there should be an additional option that hides query syntax and shows the underlying method calls with lambda expressions instead of anonymous methods. That way, Reflector could display this LINQ expression:
var query = from m in typeof(String).GetMethods()
            orderby m.Name
            select m.Name;

Like this:

var query = typeof(String).GetMethods().OrderBy(m => m.Name).Select(m => m.Name);

Regardless of these issues, which I hope are addressed (are you reading this, Roeder?!?), the .NET Reflector is a life-changing tool. If it isn't already a part of your developer's toolbox, you should go download it right now.

1I'm one of them.

posted on Monday, September 24, 2007 9:46:09 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Friday, September 21, 2007
I had the priviledge to attend Bill Wagner's "C# 3.0: Think More, Type Less" talk last night at the Greater Lansing User Group .net. As usual, Bill explained each of the new features coming in C# 3.0 with the ease of a true Jedi Master. If you have an opportunity to see this talk, I highly recommend it.

posted on Friday, September 21, 2007 9:04:24 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Thursday, September 20, 2007
This is the sixth article in my series on functional programming ideas using C#. As promised, we're digging into partial application to explore more practical applications of currying.
posted on Thursday, September 20, 2007 7:43:07 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Wednesday, September 19, 2007
Recently, a reader inquired as to how I format the source code samples on my blog. After writing up the set of steps that I normally run through, I decided that I should post them here so that 1) interested readers might benefit, and 2) I won't forget my own process.

First of all, I should point out that there are several tools available to help format source code for the web, and Brig Lamoreaux has a good review of several here. My personal tool of choice is CopySourceAsHtml. It does a great job of getting code out of Visual Studio with accurate syntax highlighting. With a few tweaks to the HTML, it can produce exactly what I need.

Here is the exact process that I go through to create code samples that look equally good on the web and in RSS feeds.

Step 1: Write Your Code

Step 1: Write Your Code

Some important tips:

  • The code should be concise and well formatted. Most readers skim code, so it should be clear enough to get the general idea from a brief overview.
  • Make sure the code compiles! It can be quite embarrassing to be contacted by a reader who, after copying and pasting the code, found that it didn't compile.

Step 2: Copy As HTML

Step 2: Copy As HTML

Select the code sample in Visual Studio, and decrease the indent (Shift+Tab) until the code is aligned at column 1. Next, select "Copy As HTML..." from the editor's context menu. At this point, you'll be presented with the "Copy As HTML" dialog.

The first time that the dialog is displayed, some options need to be set. Fortunately, the dialog remembers your settings so that you don't need to change them next time. On the "General" tab, uncheck everything except for "Embed styles."

Step 2a: Copy As HTML (General tab)

Next, switch to the "File Style" tab to add additional CSS styles to the <div> tag that will surround the HTML-formatted code sample. Here are the styles that I use for my blog:

border: 1px dotted rgb(221, 221, 221);
margin: 4px;
padding: 4px;
font-family: Consolas,'Courier New',Courier,monospace;
font-size: small;
color: rgb(0, 0, 0);
background-color: rgb(255, 255, 255);

Step 2b: Copy As HTML (File Style tab)

Finally, click the OK button to copy the code to the clipboard.

Step 3: Massage the HTML

Once pasted in the HTML source editor of your choice, the code sample will render like this:

Sub Main()
  Dim publicationdate = Date.Today
  Dim isbn = 42
  Dim price = 0.99D
  Dim firstName = "Dustin"
  Dim lastName = "Campbell"
 
  Dim book = <book publicationdate=<%= publicationdate %> ISBN=<%= isbn %>>
               <title>F#: For The Excessively Nerdy</title>
               <price><%= price %></price>
               <author>
                 <first-name><%= firstName %></first-name>
                 <last-name><%= lastName %></last-name>
               </author>
             </book>
End Sub

Unfortunately, CopySourceAsHtml wraps every line in the code sample with <pre style="margin: 0px;"></pre> tags. These tags override some of the CSS styles that we specified for the <div> tag. Thankfully, this is easily corrected with two replace operations:

  1. Replace all instances of <pre margin="0px"> with blank text.
  2. Replace all instances of </pre> with <br /> to preserve the line breaks.

The syntax coloring is achieved by using <span> tags. Occasionally, a space will appear between two uses of a <span> tag. For example, in the code sample above, "End Sub" is actually represented like this:

<span style="color: blue;">End</span> <span style="color: blue;">Sub</span>

Some RSS readers make the mistake of removing the space in between these <span> tags, causing the words to run together. To fix this potential problem, just replace all instances of "</span> <span" with "</span>&nbsp;<span".

When finished, the code sample should render like this:

Sub Main()
  Dim publicationdate = Date.Today
  Dim isbn = 42
  Dim price = 0.99D
  Dim firstName = "Dustin"
  Dim lastName = "Campbell"
 
  Dim book = <book publicationdate=<%= publicationdate %> ISBN=<%= isbn %>>
               <title>F#: For The Excessively Nerdy</title>
               <price><%= price %></price>
               <author>
                 <first-name><%= firstName %></first-name>
                 <last-name><%= lastName %></last-name>
               </author>
             </book>
End Sub

Really, it's not that much effort. Once the CopySourceAsHtml options are set to your liking, it is a simple matter to copy, paste and make a few modifications to get the desired result. Most of the work is in writing the code sample.

posted on Wednesday, September 19, 2007 1:39:54 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]

kick it on DotNetKicks.com
There's been some interest recently in the new XML literal feature coming to Visual Basic 9. If you're not familiar with this feature, the idea is that you can embed XML directly into VB code like this:
Sub Main()
  Dim publicationdate = Date.Today
  Dim isbn = 42
  Dim price = 0.99D
  Dim firstName = "Dustin"
  Dim lastName = "Campbell"

  Dim book = <book publicationdate=<%= publicationdate %> ISBN=<%= isbn %>>
               <title>F#: For The Excessively Nerdy</title>
               <price><%= price %></price>
               <author>
                 <first-name><%= firstName %></first-name>
                 <last-name><%= lastName %></last-name>
               </author>
             </book>
End Sub

That compiles to something similar to this:

Public Sub Main()
  Dim publicationdate As Date = Date.Today
  Dim isbn As Integer = 42
  Dim price As Decimal = 0.99
  Dim firstName As String = "Dustin"
  Dim lastName As String = "Campbell"
 
  Dim book = New XElement("book", _
                          New XAttribute("publicationdate", publicationdate), _
                          New XAttribute("ISBN", isbn), _
                          New XElement("title", "F#: For The Excessively Nerdy"), _
                          New XElement("price", price), _
                          New XElement("author", _
                                       New XElement("first-name", firstName), _
                                       New XElement("last-name", lastName)))
End Sub

To me, this is a great example of what syntactic sugar should be all about: making tasks easier for developers. The VB team has gone to great pains to expose the new APIs in System.Xml.Linq in the most intuitive way possible. As a C# guy, I'm shamefully filled with deep feelings of VB envy.

Since I spend most of my time working on a wholly remarkable refactoring tool, you might be wondering what sort of refactoring support we have in store for these snazzy new XML literals. How about Extract Method?

Here's the preview hint that is displayed for Extract Method when the XML literal is selected:

Extract Method on XML Literal

And here's the successfully refactored code after applying Extract Method:

Private Function GetBook(ByVal publicationdate As Date, ByVal isbn As Integer, _
                         ByVal price As Decimal, ByVal firstName As String, _
                         ByVal lastName As String) As XElement
 
  Return <book publicationdate=<%= publicationdate %> ISBN=<%= isbn %>>
           <title>F#: For The Excessively Nerdy</title>
           <price><%= price %></price>
           <author>
             <first-name><%= firstName %></first-name>
             <last-name><%= lastName %></last-name>
           </author>
         </book>
End Function
 
Sub Main()
  Dim publicationdate = Date.Today
  Dim isbn = 42
  Dim price = 0.99D
  Dim firstName = "Dustin"
  Dim lastName = "Campbell"
 
  Dim book = GetBook(publicationdate, isbn, price, firstName, lastName)
End Sub

Notice the pieces that Refactor! must have to be in place to get this right:

  • The refactoring must be smart enough to understand how the XML literal is transformed into XElements, XAttributes and XNames under the hood.
  • The refactoring must identify any dependant variables that are referenced within the embedded expressions of the XML literal.
  • The refactoring must infer the types of the dependant variables in order to declare the parameters of the new method.

We still have some work to do before Visual Studio 2008 reaches the RTM stage, but it looks like things are shaping up nicely.

posted on Wednesday, September 19, 2007 6:21:21 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

kick it on DotNetKicks.com
 Thursday, September 06, 2007
Last night, I managed to catch the Counting Crows just as their summer tour was wrapping up. It was a fantastic show, full of energy and nostalgia. They played a few bits from their new album (due out in November), and performed several songs from the past. I was pleased that their set list was made up of songs they really seemed to enjoy playing (e.g. "Perfect Blue Buildings") rather than just laboriously running through their hits. For me, it was definitely a musical walk down memory lane. For a couple of hours, I was taken back to the post-grunge era of the mid-nineties when guys like Dave Matthews ruled the college rock world.

posted on Thursday, September 06, 2007 3:29:55 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]

kick it on DotNetKicks.com