This recent blog post caused quite a stir on the F# mailing list. The post presents two solutions for Project Euler Problem 14: one in C# and the other in F#. The C# version clearly is hand-optimized for speed (and is indeed very fast), but the F# solution isn't. Instead, the F# code appears to be written with elegance and brevity in mind. Robert Pickering presented a challenge to create a faster F# solution, and the F# mailing list (which had been dormant for a couple of weeks) literally exploded with ideas.
Before we go too far afield, below are the instructions for Project Euler Problem 14.
The following iterative sequence is defined for the set of positive integers:n → n/2 (n is even)n → 3n + 1 (n is odd)Using the rule above and starting with 13, we generate the following sequence:13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.Which starting number, under one million, produces the longest chain?NOTE: Once the chain starts the terms are allowed to go above one million.
As you can see, the problem boils down to generating one million sequences and counting the number of terms in each to find the longest. It's not rocket science, but there are many ways to optimize the solution. However, I'm not interested in creating the most optimal F# code to compete with the C# version. That's already been done by many of the gurus. Instead, I'd like to know if we can write C# code that more closely matches the algorithm used by the F# solution. Let's see how C# matches up when playing F#'s game.
Below is the F# solution that's caused so much debate.
If you've been following my F# articles, the code above shouldn't contain anything new. Pattern matching, tuples, option types, list comprehensions, functions and the forward pipe operator should all be reasonably familiar. The only item that truly requires explanation is the Seq.unfold function.
In F#, seq<'a> is an alias for the .NET interface, IEnumerable<T>, which is implemented by every collection in the Base Class Library and is the centerpiece of LINQ to Objects. In fact, in the F# libraries, seq<'a> is defined very simply as...
See? Simple.
Seq.unfold is a library function that creates a seq<'a>, given a generator function. The resulting seq<'a> is lazily evaluated. That is, when a value in the seq<'a> is requested, the generator function is called to calculate it. In practice, it's similar to a C# iterator. The signature of Seq.unfold looks like so:
The generator function is usually the part that trips people up. When called, the generator is passed a bit of state. This state is initialized by the second argument passed to Seq.unfold. To confuse matters further, the generator returns an option type containing a tuple. This option+tuple result represents three pieces of information:
Note that the state and the result aren't required to be of the same type. This means that you can use Seq.unfold to generate some fairly flexible sequences. For example, we could produce the counting numbers as a sequence of strings.
The above code produces an infinite sequence. That's OK because the sequence is lazy. However, we could introduce starting and stopping values by declaring a function and modifying the generator.
That function can be called like so (using the F# Interactive Environment):
Now that we have a better understanding of how Seq.unfold works, consider the generator function from the F# solution shown earlier.
This very elegantly represents the rules of the sequence as defined by Project Euler.
n → n/2 (n is even)n → 3n + 1 (n is odd)
To create a C# solution that maps to the F# solution, we'll need to define an Unfold function. To do that, we need to create Options and Tuples. Below are the interfaces of the Option and Tuple types we'll use in C#.
I don't want to derail our discussion by delving too deeply into them. If you're interested, you can download the source and explore them at your leisure. For our purposes, you can trust that they work as expected. One point of interest is the additional Option.Some<T>() and Tuple.New<T1, T2>() helper methods. These are in place to allow us to remove some code-clutter by taking advantage of the C# compiler's type inference for generic type parameters.
Defining Unfold as a C# iterator is fairly straightforward—that is, once you get past the scary nested generic type of the generator parameter.
Using Unfold, we can write C# code that determines the length of one of the Project Euler sequences, given a starting value.
That code is remarkably similar to the original F# code (restated below).
OK. The bulk of the work is complete. The rest of the solution can be coded with a query expression.
Again, the symmetry between the C# query expression above and the original F# code below is striking.
Where do we stand in terms of performance? Well, the C# code that we just wrote takes about 16 seconds to find the correct answer on my machine, but the F# solution comes in at around 12 seconds. The C# solution from the original blog post executes in about .047 seconds on my machine, but Robert Pickering's final F# solution takes .064 seconds. Is there a clear winner? In my opinion, no, not really.
So, what have we learned? When one takes the time to implement the same algorithm in each language, F# and C# really have a similar performance profile. However, the F# compiler definitely is tuned for elegance and beauty.
Instead of comparing apples to oranges, try comparing apples to apples and oranges to oranges. A performance comparison between two algorithms written in two different languages is interesting on some level, but it isn't really a fair comparison.
Download the source code for this article
Page rendered at Friday, August 29, 2008 1:26:14 AM (Eastern Standard Time, UTC-05:00)