Wednesday, September 17, 2008

Prime numbers.

If there’s one mathematical curiosity that appears more often than any other in the Project Euler problems, it’s prime numbers. To be fair, we've dealt with primes before, but problem seven is the first that requires a prime number generator as part of its solution.

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10001st prime number?

I must admit that I’ve been riffing on this problem for quite a while now. There are so many variations to prime number generation that I’ve had difficulty choosing one with the right balance of elegance and efficiency. Because I’ll be reusing my generator for future problems, I must be certain that it’s fast enough. However, as always my primary goal is to produce the most beautiful solution that I can.

Spoiler alert! I'm revealing the problem solution early.

primes |> Seq.nth 10000

Trivial, right? Of course, the challenge of this problem is in declaring the magic behind primes. We have a couple of options available to us, but since there’s a great deal that can be learned from using brute force to solve a problem, we’ll first try the…

The Naïve Approach

The most straightforward way to generate prime numbers is to simply test every number, starting from 2—the first prime—with some primality test. Perhaps something like the code below.

{ 2L..System.Int64.MaxValue } |> Seq.filter isPrime

For the primality test, we can test the candidate prime against every smaller number. If it is evenly divisible by any smaller number other than 1, it isn't prime.

let isPrime n =
  { 2L..n-1L } |> Seq.for_all (fun x -> n%x <> 0L)

Putting the pieces together gives us a real working solution.

let isPrime n =
  { 2L..n-1L } |> Seq.for_all (fun x -> n%x <> 0L)

let primes =
  { 2L..System.Int64.MaxValue } |> Seq.filter isPrime

primes |> Seq.nth 10000

Are we finished? Hardly! While simple, this prime number generator takes a whopping 30 seconds on my hefty-yet-quickly-obsoleting machine! That might fall within Project Euler's "one-minute rule," but we can certainly do better.

Optimizing Naïvely

The obvious optimization is to reduce the set of numbers that isPrime tests. Observe that1 the largest factor of a number (other than itself) is its square root. Armed with this knowledge, we can improve the primality test by testing just the natural numbers from 2 through the square root of n.

let isPrime n =
  let limit = int64 (sqrt (float n))
  { 2L..limit } |> Seq.for_all (fun x -> n%x <> 0L)

That's a huge improvement! Finding the 10001st prime now only takes about .25 seconds.

We can do better yet. If 2 is the only even prime, why are we bothering to test any other even numbers? We can save more time by testing only the odds.

let odds limit =
  Seq.unfold (fun n ->
    if n > limit then None
    else Some(n,n+2L)) 3L

let isPrime n =
  let limit = int64 (sqrt (float n))
  odds limit |> Seq.for_all (fun x -> n%x <> 0L)

let primes =
  seq { yield 2L
        yield! odds System.Int64.MaxValue |> Seq.filter isPrime }

That brings the solution down to approximately .2 seconds.

NeRd Note
Curious about the use of yield and yield! in the example above? The difference between these keywords is simple yet powerful. yield simply returns a single element of a sequence expression, much like the yield return of C# iterators. However, yield! returns another sequence expression as part of the sequence.2 This is an extraordinarily powerful feature that offers a lot of flexibility. In part 2, we'll use yield! to produce an elegant recursive sequence expression.

Before moving on, let's make one last optimization to our naïve algorithm. We can take advantage of the fact that every prime, after 2 and 3, is of the form 6k ± 1. By reducing the set of numbers used by our primality test to those of this form, we can eke out a tiny bit more speed.

let inline next k i =
  if i = -1L then (k,1L)
  else ((k+1L),-1L)

let candidates limit =
  Seq.unfold (fun (k,i) ->
    let v = 6L*k + i
    if (v > limit) then None
    else Some(v, (next k i))) (1L,-1L)

let isPrime n =
  let limit = int64 (sqrt (float n))
  candidates limit |> Seq.for_all (fun x -> n%x <> 0L)

let primes =
  seq { yield! [2L;3L]
        yield! candidates System.Int64.MaxValue |> Seq.filter isPrime }

Using the prime number generator above, our solution takes around .15 seconds. Not too shabby! To be fair, this problem deals with reasonably small prime numbers. Future Project Euler problems (like problem ten) will benefit from a more efficient algorithm. Next time we’ll take a look at another well-known algorithm for generating prime numbers.

 

1"Observe that"? Clearly, I've been reading too many academic papers lately.
2F#'s yield! is similar to the stream-flattening concept of . Perhaps this would be an useful extension to the C# language? Mads? What do you think?

posted on Wednesday, September 17, 2008 4:33:05 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1]

kick it on DotNetKicks.com
 Monday, September 01, 2008

NOTE: I have shamefully stolen the title of this post from the custom T-shirt sported by Amanda Laucher at Tech Ed Developer 2008. The cleverness is all hers.

I’m a bit late to the blog posting party, but the F# Team shipped the F# September 2008 CTP on Friday. Kudos to Don, Luke, Chris, Brian, Jomo and the gang! Note that this isn’t yet another research release1, but an honest-to-goodness preview of F# as a fully-supported .NET language. In addition to the CTP release, the F# Developer Center is now open for business!

The CTP is jam-packed with hotness including…

  • The new F# Project System. Brian has the details here, here, here, here, and here.
  • Sweet scripting support. Check out Jomo’s “Zero to Execute in Ten Seconds” for the nitty-gritty.
  • Units of Measure. That’s right—amidst all of the effort to pull the CTP together, the F# Team managed to include a ground-breaking new feature! Read the first part of Andrew Kennedy’s series on this killer language feature here.2

This is just a taste of what’s in the CTP. Read the detailed (and I do mean detailed) release notes for more info.

1YARR — pirate speak.
2I have a certain amount of affection for the Units of Measure language feature after seeing its power firsthand while we competed in this year’s ICFP Programming Contest.

posted on Monday, September 01, 2008 10:01:51 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]

kick it on DotNetKicks.com
 Friday, August 01, 2008

Victory

After two full months with no new posts, I’m finally coming up for air. The past two months have been some of the busiest of my life. Below are a few of the things I’ve been doing.

  • Relocating to the Seattle area. At the ALT.NET Open Spaces, Seattle, John Lam warned me not to underestimate the stress of relocation. I did my best to heed his warning, but the challenges of moving a family 2,000 miles away from friends and familiarity are many. The initial months cramped in a small, temporary apartment were brutal. However, I’m happy to report that I’m writing this from our new home, surrounded by boxes yet to be unpacked. Now, if only our house back in Toledo, Ohio would sell…
  • Starting a new job at Microsoft. Getting used to the rapid pace at Microsoft takes a lot of effort. In fact, new hires generally aren’t expected to be really productive until after the first six months. Though I have to admit, I really like it. It’s exciting to be working shoulder to shoulder with so many people who are passionate about creating the best developer tools that they can.
  • Attending Tech Ed Developers 2008. After living in Seattle for just two weeks, I spent a week in Orlando at Tech Ed Developers 2008. Thankfully, I only presented one session, “Hardcore Reflection.”
  • Participating in the ICFP 2008 Programming Contest. I joined Luke Hoban, Brian McNamara and Chris Smith for a full weekend of F# coding for this year’s ICFP Programming Contest. Brian has the full details here.
  • Taking on more responsibility at work. I joined Microsoft as a Program Manager on the IDE in the Visual Studio Managed Languages group (which includes C#, Visual Basic, IronRuby, IronPython and F#). I work with two other program managers, Karen Liu and DJ Park, to drive the development experience of the IDE. At first, my primary area of responsibility was the debugger. However, due to some shuffling around, I have taken the role of the program manager for the Visual Basic IDE. Being passionate about programming languages and developer tools, I am enormously excited about this new responsibility.

Since I am the new Visual Basic IDE program manager, some of you might be wondering what’s going to happen to this blog. The answer is, very little. You might see a bit more Visual Basic code, but it’s always been here. The truth is, I’m a fairly multi-lingual guy. Before joining Microsoft, I was a C# MVP. Now I’m working on the Visual Basic IDE, and every morning I install the latest F# dogfooding bits. Rest assured, I still intend to post articles on C# and F#, as well as Visual Basic.

It’s all done in .NET after all.

posted on Friday, August 01, 2008 10:41:15 AM (Eastern Standard Time, UTC-05:00)  #    Comments [4]

kick it on DotNetKicks.com
 Tuesday, May 06, 2008

Project Euler problem six is another easy one.

The sum of the squares of the first ten natural numbers is,

12 + 22 + ... + 102 = 385

The square of the sum of the first ten natural numbers is,

(1 + 2 + ... + 10)2 = 552 = 3025

Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640.

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

The solution to this problem boils down to a few folding operations and a map. The one-liner is below.

List.fold_left (+) 0 [1..100] * List.fold_left (+) 0 [1..100] - List.fold_left (+) 0 (List.map (fun x -> x * x) [1..100])

Pretty nasty, eh? Quite a bit of code duplication can be removed. Since they're identical, let's generalize all of the folds first by extracting them to a sum function.

let sum lst = List.fold_left (+) 0 lst

sum [1..100] * sum [1..100] - sum (List.map (fun x -> x * x) [1..100])

That already looks a lot better.

Next, we can generalize the multiplication operations. Each time multiplication occurs in the solution above, it's simply squaring a value. So, we can extract those operations into a square function.

let square x = x * x

square (sum [1..100]) - sum (List.map (fun x -> square x) [1..100])

We can simplify that even further. Because the anonymous function passed to List.map just applies its argument to the square function, we can pass square directly.

square (sum [1..100]) - sum (List.map square [1..100])

Next, let's generalize the call to List.map that produces a list of squares by moving it to a new function, squares.

let squares lst = List.map square lst

square (sum [1..100]) - sum (squares [1..100])

At this point, we have a perfectly acceptable solution. It states the problem almost like natural English: "The square of the sum of 1 to 100 minus the sum of the squares of 1 to 100." So, why are there a few more inches left in this article? Well, I'd like to take this a step further.

Thinking more abstractly, what does our solution do? It computes the difference of two calculations that are based on the same list. We can extract this general process to a new function like so:

let difference f1 f2 lst = f1 lst - f2 lst

difference (fun l -> square (sum l)) (fun l -> sum (squares l)) [1..100]

It turns out that we can simplify these anonymous functions in the same way that we did with the square function earlier. However, because there are two functions involved in each calculation, we must compose the functions together. In F#, there are two operators used to perform function composition: the >> operator, which applies the functions from left to right, and the << operator, which applies the functions from right to left. Obviously, we need the latter.

difference (square << sum) (sum << squares) [1..100]

After using the forward pipe operator to move the list to the front, we're finished.

[1..100] |> difference (square << sum) (sum << squares)

"Take the numbers 1 to 100 and find the difference of the square of the sum and the sum of the squares."

Function composition is beautiful.

posted on Tuesday, May 06, 2008 6:21:26 AM (Eastern Standard Time, UTC-05:00)  #    Comments [3]

kick it on DotNetKicks.com

A few days ago, I presented a solution for Project Euler problem four that I didn't really like. The challenge of problem four is to write a function that determines whether a number is a palindrome, that is, whether it reads the same backward as forward. When presented with that challenge, I took an approach that I feel is a bit of a cop-out: converting the number to a string, reversing the string and comparing the result. This felt somehow wrong because I'm not really solving the problem in a mathematical way. So, I'm declaring a mulligan. Below is a new function which properly performs the math necessary to reverse a base-10 number.

let reverse n =
  let rec loop x res =
    if x = 0 then res
    else loop (x/10) (res*10 + (x%10))

  loop n 0

let isPalindrome n =
  n = reverse n

Our original list comprehension below still works properly with the new isPalindrome function.

[ for x in 100..999
    for y in 100..999
      when isPalindrome(x*y) -> x*y ] |> toLargest

This solution is twice as fast as the original string-based solution. In addition, I'd argue that the tail-recursive loop is at least four times as beautiful. :-)

posted on Tuesday, May 06, 2008 6:21:13 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]

kick it on DotNetKicks.com
 Monday, May 05, 2008

At first glance, Project Euler problem five looks like a walk in the park:

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest number that is evenly divisible by all of the numbers from 1 to 20?

Sounds easy! The most straightforward solution is to take the sequence of all natural numbers, filter those that are evenly divisible by 1 through 20, and pop off the first element of the sequence (the head). Something like the code below would do the trick.

{ 1L .. Int64.max_int }
  |> Seq.filter (fun n ->
       [1L .. 20L] |> List.for_all (fun d -> n % d = 0L))
  |> Seq.hd

Unfortunately, that solution, while direct, falls far outside of Project Euler's "one-minute rule." It eventually calculates the correct answer but takes as much as 10 minutes on my machine!

OK, let's take a step back. What exactly is the problem asking us to find? Stating it differently, "What is the least common multiple of all of the numbers from 1 to 20?"

The least common multiple (LCM) of two numbers is the smallest number that is evenly divisible by each. Still not familiar? Think about how fractions are added. The first step in adding fractions is to find the least common denominator, which is simply the LCM of the denominators. For example, given the fractions 1/8 and 1/12, we would find the LCM of 8 and 12. Then, the fractions would be rewritten with the LCM (24) as their denominators. Once this is done, we can easily add the fractions 3/24 and 2/24 to get the answer, 5/24.

So, how should we go about calculating the LCM of two numbers? It turns out that there are many well-known possibilities. One of the most popular methods involves finding the prime factors of both numbers. It goes something like this:

Suppose we wanted to find the least common multiple of 160 and 90. First, we would write out the prime factors of each:

160 = 25 * 51
90 = 21 * 32 * 51

The least common multiple can be computed by multiplying the highest power of each unique factor.

lcm(160,90) = 25 * 32 * 51 = 1440.

Many people have chosen this method when working through problem five, and I was tempted to take this road as well because it would allow us to reuse our primeFactors function from problem three. However, the code would be fairly complex.

let countItems lst =
  let incrCount m i =
    match Map.tryfind i m with
    | Some(c) -> Map.add i (c+1) m
    | None -> Map.add i 1 m

  lst |> List.fold_left incrCount Map.empty |> Map.to_list

let lcm x y =
  let rec updateMap m t =
    let i,c = t
    match Map.tryfind i m with
    | Some(v) when v < c -> Map.add i c m
    | None -> Map.add i c m
    | _ -> m

  let factors =
    [x; y]
    |> List.map primeFactors
    |> List.map countItems
    |> List.fold_left (List.fold_left updateMap) Map.empty

  Map.fold (fun i c res -> res * int64 (float i ** float c)) factors 1L

Personally, I feel a sense of accomplishment at writing all of those folds—particularly the double-fold near the end, that's really cool. :-) However, it's pretty far below my standard for code beauty. If you recall, I'm trying to present the most beautiful solution that I can. So, I'm rejecting this solution, even though it's efficient enough to meet Project Euler's requirements. Admittedly, there's a certain beauty in the list transformations, but there's a much better method.

The least common multiple of two numbers can be calculated quite simply using their greatest common divisor (GCD), or the largest number that divides evenly into both numbers. The GCD can be computed easily with the Euclidean algorithm. Here's how it works:

  1. Start with 2 natural numbers, x and y
  2. If y is equal to zero, the answer is x.
  3. If not, set x to the value of y, and y to the remainder of dividing x by y.
  4. Go back to step 2.

For the more visual among you, a flowchart of the Euclidean algorithm is pictured below.

EuclideanAlgorithm

Once we have the GCD, calculating the LCM is easy. Simply divide x by the GCD of x and y, and multiply the result by y. These two algorithms can be implemented quite beautifully in F#.

let rec gcd x y =
  if y = 0 then x
  else gcd y (x%y)

let lcm x y =
  if x = 0 or y = 0 then 0
  else (x / (gcd x y)) * y

However, the F# libraries already supply a function to calculate the GCD of two numbers. The greatest common denominator also goes by another name, highest common factor (HCF), and there is an HCF function in the Microsoft.FSharp.Math.BigInt module. It's a simple matter to rewrite lcm using BigInt.hcf.

open Microsoft.FSharp.Math

let lcm x y =
  if x = 0I or y = 0I then 0I
  else (x / (BigInt.hcf x y)) * y

With lcm in place, would you believe that our solution looks like this?

[1I..20I] |> List.reduce_left lcm

F# can produce truly beautiful code indeed!

posted on Monday, May 05, 2008 8:12:44 AM (Eastern Standard Time, UTC-05:00)  #    Comments [6]

kick it on DotNetKicks.com
 Friday, May 02, 2008

Yet Another Project Euler Series (YAPES) continues with problem four:

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.

Find the largest palindrome made from the product of two 3-digit numbers.

The most straightforward way to determine if a number is palindromic is to convert it to a string and compare that string with its reverse. Sound easy? It is!

One minor snag is the lack of a library function in F# for reversing strings, but that's easily defined like so:

module String =
  let rev (s : string) = new string(s.ToCharArray() |> Array.rev)

With String.rev in place, writing an isPalindrome function is trivial.

let isPalindrome n =
  let text = Int32.to_string n
  text = String.rev text

Using a list comprehension, we can generate all of the palindromes that are products of 3-digit numbers. Once we have this list, producing the result is as simple as passing it to the toLargest function that we defined for Problem Three.

[ for x in 100..999
    for y in 100..999
      when isPalindrome (x*y) -> x*y ] |> toLargest

Short and sweet—my favorite!

posted on Friday, May 02, 2008 7:49:17 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1]

kick it on DotNetKicks.com
 Thursday, May 01, 2008

Project Euler problem three is first of many to deal with prime numbers.

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143?

Eventually, we'll need a prime number generator to solve some of the more advanced problems, but this problem can be solved efficiently without one. The number in question is small enough (just 12 digits) that the divide-and-conquer method that many of us learned in elementary school will suffice.

Consider how we might use this process to find the prime factors of 140.

140 Is 140 evenly divisible by 2? Yes! Remember 2 and divide 140 by 2.
2 * 70 Is 70 evenly divisible by 2? Yes! Remember 2 and divide 70 by 2.
2 * 2 * 35 Is 35 evenly divisible by 2? No, how about 3? No. 4? Nope. 5? Yes! Remember 5 and divide 35 by 5.
2 * 2 * 5 * 7 And we're done!

This method isn't rocket science, but it gets the job done. In fact, it's pretty fast for reasonably small numbers. After all, we're not trying to find the factors of RSA-200. :-)

The basic algorithm is pictured as a flowchart below.

PrimeFactorization

The following F# function implements our algorithm.

let primeFactors n =
  let inline isFactor n d = n % d = 0L

  let rec nextFactor n d =
    let x = if d = 2L then 3L else d+2L
    if isFactor n x then x else nextFactor n x

  let rec findFactors n d acc =
    if isFactor n d then
      findFactors (n/d) d (d::acc)
    elif n > d then
      findFactors n (nextFactor n d) acc
    else
      acc

  findFactors n 2L [] |> List.rev

To the uninitiated, that function might look pretty complex. In reality, it's extremely simple, but three other functions are nested inside of it. Let's look at each nested function in turn.

let inline isFactor n d = n % d = 0L

There's nothing tricky about isFactor. It simply abstracts the modulo operation that determines whether n is evenly divisible by d.

let rec nextFactor n d =
  let x = if d = 2L then 3L else d+2L
  if isFactor n x then x else nextFactor n x

nextFactor recursively determines the next value of d to be used in the algorithm. There is a small optimization here: nextFactor only produces odd numbers. Since 2 is the only even prime, why bother checking any other evens?

let rec findFactors n d acc =
  if isFactor n d then
    findFactors (n/d) d (d::acc)
  elif n > d then
    findFactors n (nextFactor n d) acc
  else
    acc

The meat of the algorithm is handled by findFactors. Any factors found are cons'd up with the accumulator variable, acc. Note that both findFactors and nextFactor are written tail-recursively, so they can be optimized by the compiler to conserve stack space.

The real body of primeFactors kicks off the recursion:

findFactors n 2L [] |> List.rev.

The result of findFactors is passed to List.rev to return the prime factors in a more logical order (smallest to largest).

A simple test in the F# Interactive Environment shows that primeFactors works as expected.

> primeFactors 140L;;

val it : int64 list = [2L; 2L; 5L; 7L]

Almost done.

Project Euler Problem Three asks, "What is the largest prime factor of the number 600851475143?" That's just a matter of folding the list of prime factors with the max function (from the F# libraries) to get the answer.

primeFactors 600851475143L |> List.fold1_left max

We can generalize the folding logic above with a new function...

let toLargest l = List.fold1_left max l

...And now we can write the following solution.

primeFactors 600851475143L |> toLargest

That's just lovely.

NeRd Note
Eagle-eyed readers might have noticed that the problem could have been solved several inches ago. If primeFactors didn't reorder its results from smallest to largest, the solution to the problem would be in the head of the result list!
primeFactors 600851475143L |> List.hd
However, that solution has some very real consequences. First of all, primeFactors won't return its results in the most logical order, which limits its reusability. Secondly, the intent of the code isn't stated as clearly. And finally, it's a leaky abstraction because the solution relies upon intimate knowledge of how primeFactors returns its results. If primeFactors were changed later, the solution would be broken!
posted on Thursday, May 01, 2008 9:55:56 AM (Eastern Standard Time, UTC-05:00)  #    Comments [2]

kick it on DotNetKicks.com