Last time, we took a brute force approach to solving Project Euler problem seven. Unfortunately, the resulting prime number generator turned out to be fairly ugly and not really efficient enough to handle the needs of future YAPES1 problems. This time, we’ll approach the problem using a tried-and-true algorithm that should give us the desired performance and produce a beautiful solution. I can already hear some of you rolling your eyes, because we’ll be looking at...
Amazingly, the Sieve of Eratosthenes is perhaps the oldest known algorithm for generating prime numbers, and yet it remains reasonably efficient by today’s standards. It’s true that there are faster methods available, but this should be fast enough. The basic algorithm is nothing earth-shattering—it’s first semester computer science stuff. However, we’ll add a twist to keep things interesting.
The Sieve of Eratosthenes works by marking composite numbers:
As described, this algorithm doesn’t quite work for our purposes. It is certainly easy to implement, but because of the starting requirement (a big list of numbers), it doesn’t lend itself well to generating a, potentially, infinite stream of primes. However, with a small modification, we can make it work.
The adaptation we’ll implement comes from from the excellent paper, The Genuine Sieve of Eratosthenes2, by Melissa E. O’Neill. The idea is to turn the algorithm inside out. Instead of keeping track of which numbers are prime, we’ll maintain a table of composite numbers along with their known factors. As we iterate the natural numbers, the table will be updated with new information about which numbers are known to be composite and what factors have been found.
Confused? Let’s walk through the algorithm, step by step.
We start with 2 as the first natural number and check to see if it exists in the composite table. Since the table is empty, 2 isn’t present, so we add a new list containing 2 to the table with the square of 2 (4) as the key. Finally, since 2 was not found in the composite table, we know it is prime.
We take the next natural number, 3. Like before, 3 is not found in the composite table, so we add it to the table using its square (9) as the key and return 3 as prime.
We take the next natural number, 4. This time is different—4 is found in the composite table, so we know it’s not prime. In this case, we take the list found in the table under 4 (the known factors), and for each factor, we add 4 and insert the result into the table. Thus, we insert 6 into the table along with its known factor, 2. (Interestingly, this algorithm doesn’t tell us that 3 is a factor of 6). Finally, we remove 4 from the table.
Let's skip ahead to some more interesting cases.
After 9 has been processed, the table looks like so:
The next natural number, 10, is found in the table, so we know that it’s composite and not prime. Again, we take the list of known factors, and for each factor, we add 10 and insert the result into the table. This time, the value (12) already exists in the table, so we add it to the list of known factors for that value. Lastly, we remove 10 from the table.
The next natural number, 11, is not found in the table, so we insert its square and return it as prime.
We find the next natural number, 12, in the table. However, because there is more than one known factor in the table, we add two new composites, 12+2=14 and 12+3=15, before finally removing 12.
As you can see, this take on the classic algorithm allows us to determine which numbers are prime and composite by using a minimal amount of memory. Obviously, the table can grow large as we find larger primes, but the memory usage is quite reasonable relative to the traditional sieve where a large list of natural numbers would have to be generated before even the first prime is returned.
The F# code nearly writes itself.
The meat of the algorithm is in the sieve function. Here, we try to locate x in the table. If it is not found, we yield x (because it’s prime) and recursively call sieve passing x+1 and a table with the square of x added to it. (Note that Map is an immutable data structure, so “mutating” functions like Map.add actually return a brand new Map.) If x is found in the table, we recursively call sieve passing x+1 and a table from which x has been removed and each known factor has been reinserted. The reinsert function performs the trivial work of reinserting a known factor into the table.
A couple of other interesting details about this code are worth pointing out:
Now that we have a working prime number generator, let’s take a moment to optimize. Until now, we’ve been sticking with the immutable F# Map type, but we could employ a mutable data structure to improve performance. To replace the data structure in a way that preserves the beauty of the algorithm, we’ll define a set of APIs for Dictionary<TKey,TValue> that mimic the Map APIs:
Now, it’s a simple matter of replacing references to Map with Dict, and our algorithm immediately benefits from the performance boost of using a mutable table.
With our prime number generator in hand, solving Project Euler problem seven is trivial:
At last, we’ve produced a prime number generator with a good balance of performance and beauty. If we’ve learned anything from this article or its predecessor, I hope it’s that finding an optimal solution often requires finding an appropriate algorithm.
Take that Chris!
1Yet Another Project Euler Series2O'Neill's paper appears in The Journal of Functional Programming, Volume 19, Issue 01.
Page rendered at Monday, March 15, 2010 3:13:11 PM (Pacific Standard Time, UTC-08:00)
If you're interested in learning F#, this is the most comprehensive book available. The text is well written and the examples are instructive. And after all, the author is the inventor of F#.
Because this book provides source code in Standard ML, it's a fantastic resource for learning F#. One bit of warning: this book does not teach classic data structures. While structures such as binomial heaps and red-black trees are presented, it is assumed that the reader already knows and understands them.
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.