Thursday, April 03, 2008

 WeigtingApplesAndOranges

This recent blog post caused quite a stir on the F# mailing list. The post presents two solutions for Project Euler Problem 14: one in C# and the other in F#. The C# version clearly is hand-optimized for speed (and is indeed very fast), but the F# solution isn't. Instead, the F# code appears to be written with elegance and brevity in mind. Robert Pickering presented a challenge to create a faster F# solution, and the F# mailing list (which had been dormant for a couple of weeks) literally exploded with ideas.

Before we go too far afield, below are the instructions for Project Euler Problem 14.

The following iterative sequence is defined for the set of positive integers:

nn/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

As you can see, the problem boils down to generating one million sequences and counting the number of terms in each to find the longest. It's not rocket science, but there are many ways to optimize the solution. However, I'm not interested in creating the most optimal F# code to compete with the C# version. That's already been done by many of the gurus. Instead, I'd like to know if we can write C# code that more closely matches the algorithm used by the F# solution. Let's see how C# matches up when playing F#'s game.

Below is the F# solution that's caused so much debate.

#light

let seqLength n =
  n |> Seq.unfold
    (fun x -> match x with
             
| x when x = 0L      -> None
              | x when x = 1L      -> Some(1L, 0L)
              | x when x % 2L = 0L -> Some(x, x / 2L)
              | _                  -> Some(x, 3L*x + 1L))
    |> Seq.length

[for i in 1L..1000000L -> (i, seqLength i)]
  |> List.fold1_left(fun (ai,al) (i,l) -> if l > al then (i,l) else (ai,al))
  |> (fun (x,y) -> x ) |> print_any

If you've been following my F# articles, the code above shouldn't contain anything new. Pattern matching, tuples, option types, list comprehensions, functions and the forward pipe operator should all be reasonably familiar. The only item that truly requires explanation is the Seq.unfold function.

In F#, seq<'a> is an alias for the .NET interface, IEnumerable<T>, which is implemented by every collection in the Base Class Library and is the centerpiece of LINQ to Objects. In fact, in the F# libraries, seq<'a> is defined very simply as...

type seq<'a> = IEnumerable<'a>

See? Simple.

Seq.unfold is a library function that creates a seq<'a>, given a generator function. The resulting seq<'a> is lazily evaluated. That is, when a value in the seq<'a> is requested, the generator function is called to calculate it. In practice, it's similar to a C# iterator. The signature of Seq.unfold looks like so:

val unfold : generator: ('b -> ('a * 'b) option) -> 'b -> seq<'a>

The generator function is usually the part that trips people up. When called, the generator is passed a bit of state. This state is initialized by the second argument passed to Seq.unfold. To confuse matters further, the generator returns an option type containing a tuple. This option+tuple result represents three pieces of information:

  1. The option type indicates whether or not the sequence should continue. I.e., returning None signals the end of the sequence.
  2. The first part of the tuple is the current item in the sequence.
  3. The second part of the tuple is the bit of state that will be passed to the generator the next time it's called.

Note that the state and the result aren't required to be of the same type. This means that you can use Seq.unfold to generate some fairly flexible sequences. For example, we could produce the counting numbers as a sequence of strings.

1 |> Seq.unfold (fun i -> Some(i.ToString("#,#"), i + 1))

The above code produces an infinite sequence. That's OK because the sequence is lazy. However, we could introduce starting and stopping values by declaring a function and modifying the generator.

let count (start : int) stop =
  start |> Seq.unfold (fun i -> if i > stop then None
                                else Some(i.ToString("#,#"), i + 1))

That function can be called like so (using the F# Interactive Environment):

> count 1000 1003;;

val it : seq = seq ["1,000"; "1,001"; "1,002"; "1,003"]

Now that we have a better understanding of how Seq.unfold works, consider the generator function from the F# solution shown earlier.

(fun x -> match x with
          | x when x = 0L      -> None
          | x when x = 1L      -> Some(1L, 0L)
          | x when x % 2L = 0L -> Some(x, x / 2L)
          | _                  -> Some(x, 3L*x + 1L))

This very elegantly represents the rules of the sequence as defined by Project Euler.

nn/2 (n is even)
n → 3n + 1 (n is odd)

To create a C# solution that maps to the F# solution, we'll need to define an Unfold function. To do that, we need to create Options and Tuples. Below are the interfaces of the Option and Tuple types we'll use in C#.

public sealed class Option<T>
{
  public bool IsNone { get; }
  public bool IsSome { get; }
  public T Value { get; }

  public static Option<T> None { get; }
  public static Option<T> Some(T value) { }
}

public static class Option
{
  public static Option<T> Some<T>(T value) { }
}

public sealed class Tuple<T1, T2>
{
  public T1 Fst { get; }
  public T2 Snd { get; }
}

public static class Tuple
{
  public static Tuple<T1, T2> New<T1, T2>(T1 fst, T2 snd) { }
}

I don't want to derail our discussion by delving too deeply into them. If you're interested, you can download the source and explore them at your leisure. For our purposes, you can trust that they work as expected. One point of interest is the additional Option.Some<T>() and Tuple.New<T1, T2>() helper methods. These are in place to allow us to remove some code-clutter by taking advantage of the C# compiler's type inference for generic type parameters.

Defining Unfold as a C# iterator is fairly straightforward—that is, once you get past the scary nested generic type of the generator parameter.

public static IEnumerable<TResult> Unfold<T, TResult>(
  Func<T, Option<Tuple<TResult, T>>> generator, T start)
{
  var next = start;

  while (true)
  {
    var res = generator(next);
    if (res.IsNone)
      yield break;

    yield return res.Value.Fst;

    next = res.Value.Snd;
  }
}

Using Unfold, we can write C# code that determines the length of one of the Project Euler sequences, given a starting value.

public static long SeqLength(long start)
{
  return Unfold(x =>
    {
      if (x == 0L)
        return Option<Tuple<long, long>>.None;
      else if (x == 1L)
        return Option.Some(Tuple.New(1L, 0L));
      else if ((x % 2L) == 0L)
        return Option.Some(Tuple.New(x, x / 2L));
      else
        return Option.Some(Tuple.New(x, (3L * x) + 1L));
    }, start).Count();
}

That code is remarkably similar to the original F# code (restated below).

let seqLength n =
  n |> Seq.unfold
    (fun x -> match x with
             
| x when x = 0L      -> None
              | x when x = 1L      -> Some(1L, 0L)
              | x when x % 2L = 0L -> Some(x, x / 2L)
              | _                  -> Some(x, 3L*x + 1L))
    |> Seq.length

OK. The bulk of the work is complete. The rest of the solution can be coded with a query expression.

public static long GetLongestSequence()
{
  return (from i in Enumerable.Range(1, 1000000)
          select new { Start = i, Length = SeqLength(i) })
          .Aggregate((x, y) => y.Length > x.Length ? y : x)
          .Start;
}

Again, the symmetry between the C# query expression above and the original F# code below is striking.

[for i in 1L..1000000L -> (i, seqLength i)]
  |> List.fold1_left(fun (ai,al) (i,l) -> if l > al then (i,l) else (ai,al))
  |> (fun (x,y) -> x ) |> print_any

Where do we stand in terms of performance? Well, the C# code that we just wrote takes about 16 seconds to find the correct answer on my machine, but the F# solution comes in at around 12 seconds. The C# solution from the original blog post executes in about .047 seconds on my machine, but Robert Pickering's final F# solution takes .064 seconds. Is there a clear winner? In my opinion, no, not really.

So, what have we learned? When one takes the time to implement the same algorithm in each language, F# and C# really have a similar performance profile. However, the F# compiler definitely is tuned for elegance and beauty.

Instead of comparing apples to oranges, try comparing apples to apples and oranges to oranges. A performance comparison between two algorithms written in two different languages is interesting on some level, but it isn't really a fair comparison.

Download the source code for this article

posted on Thursday, April 03, 2008 4:40:31 AM (Pacific Standard Time, UTC-08:00)  #    Comments [3]

kick it on DotNetKicks.com