After a long break, it's time to return to my informal series of articles on functional programming concepts using only C# 2.0. While C# 3.0 might be syntactically more suitable to functional-style programming, 2.0 has the advantage of being more familiar, and in some cases, clearer. In addition, using C# 2.0 makes compiling sample code under Visual Studio 2005 possible, so an installation of the Orcas beta (i.e. Visual Studio 2008) is unnecessary.
Because this article builds on information that I have already been presented, a refresher might be in order. Here are links to help you get your bearings:
Let's get started...
The notion of higher-order functions is an amazingly simple (and extraordinarily powerful!) concept that has an over-complicated computer science term attached to it. The very mention of the phrase brings to mind a coven of programming warlocks whispering incantations over their code, but that's not really the case. Simply put, a higher-order function is a function that takes a function as an argument and/or returns a function as its result. (Try saying that ten times fast.) Put another way, higher-order functions are functions that are in the business of treating functions as data. They are truly the "bread and butter" of functional programming.
In order to support higher-order functions, a language must treat functions as first-class objects, but what does that mean? In truth, "first-class object" is a loaded term because it means different things to different people. For our purposes, consider Abelson and Sussman:
In the .NET Framework (and thus, C#), a function become a first-class object when wrapped in a delegate instance. If a C# function accepts a delegate as one of its arguments or returns a delegate as its result, it is a higher-order function. (By this definition, the "Memoize" function that I presented in a previous article is a higher-order function.)
Why is this useful? Let's start with a simple function that is not higher-order, which takes an array of integers and returns a new array containing only the even numbers:
Now, consider another, slightly more complicated, function that takes an array of integers and returns a new array containing only the prime numbers:
Many of us have written functions like GetEvens and GetPrimes many times—regardless of the fact that both functions perform precisely the same algorithm. Each function loops through all elements in an array and tests a predicate to see whether the element should be added to a result array. Essentially, the algorithm that the functions perform filters the original array based on some condition. Here is the essence of this filtering algorithm:
It's a small jump to realize that "some-condition" can be represented by another function passed into the Filter function as a delegate, making Filter a higher-order function:
Now, we can use the Filter function, instead of the GetEvens and GetPrimes functions, by passing delegates:
The IsEven and IsPrime functions serve as the predicates that we pass into the Filter function. Note that I could have defined them as anonymous delegates instead of as separate methods. Often, I will use anonymous delegates with higher-order functions, but for this sample I think they just confuse the code. Regardless, we have successfully used a higher-order function to abstract the essence of the Filter algorithm. Now, we don't have to reinvent the wheel every time we want to use it.
Filter is one of three well-known higher-order functions that are useful for manipulating lists or sequences: Filter, Map and Reduce. They should be immediately familiar to any Lisp or Python programmer (although Guido is dropping Reduce in Python 3000). In combination, these functions can be used to solve many computation problems. As a testimony of their power, consider the Google MapReduce programming model used for scalable processing of large datasets. Let's take a look at each of the big three functions in turn.
Filter is perhaps the easiest to understand; it simply takes one sequence and returns a new sequence that is filtered according to a predicate. This simplicity is why it served as our initial example. However, our example is pretty limited. A more robust definition of Filter should be able to consume any type and any sort of sequence, not just integers and arrays. We can achieve this by defining Filter as a generic method that processes IEnumerable<T>. Choosing IEnumerable<T> allows Filter to work with arrays and almost any other sort of data structure in the .NET Framework.
After making our Filter function generic and modifying it to use IEnumerable<T>, the initial example code still runs fine. However, we now have a definition that we can use in almost any situation that calls for filtering. (Note that we also could have implemented Filter using an iterator to make it lazy, but that is a subject for another article.)
Map is similar to Filter because it takes a sequence and returns another. However, instead of filtering the items in the sequence with a predicate, Map transforms each item with a conversion function. Consider this scenario: we want to take a sequence of important dates (can you tell me what they are?) and convert them all to text.
Like we did with Filter, we can extract the essence of the algorithm into a new function, Map. We do this by passing the piece that changes (the conversion) into the function as a delegate. The resulting definition of Map is:
This definition is slightly more complicated than Filter because it requires two generic type parameters: one for the input type and one for the output type. However, the additional generic type parameter adds tremendous power to the function because it enables Map to convert a sequence of one type to a sequence of another type.
Here is our example code modified to use Map:
The Reduce (or Fold) function is the most complicated to define in code, but it is perhaps the simplest of the big three to understand. Reduce is all about accumulation. Its purpose is to iterate through a sequence and build up a result value using each item. A simple example might be a function that takes a sequence of integers and returns the sum:
Distilling the essence of the Reduce algorithm is a bit more difficult than Filter and Map. It might not be obvious at first, but there are actually two pieces that need to be extracted.
Making the "start-value" and "some-accumulation" pieces parameters into Reduce gives us a working definition.
Now we can rewrite the original summing code to use Reduce like this:
Let's combine the big three to compute the sum of the squares of the even numbers in a sequence of integers. We can do this by using Filter to find the even numbers, Map to square them and Reduce to produce the sum.
For those of you who are clamoring for some C# 3.0, we can code the same calculation using lambda expressions (after making Map, Filter and Reduce into extension methods) like this:
Of the big three, Reduce is the most powerful. It turns out that Filter and Map are really just special cases of Reduce. In fact, we can implement Filter and Map using only Reduce.
Thinking a bit more abstractly reveals that both Filter and Map simply accumulate a new sequence from another sequence. With that in mind, we can use Reduce to perform the accumulation by passing a new list as the start value and an accumulator delegate responsible for adding items to that list.
Using anonymous delegates in this case allows us to take advantage of the closures produced when we access "predicate" and "converter" parameters from the parenting method.
You might be wondering, if the big three functions are so useful, why doesn't the .NET Framework provide them? Well, the truth is, it does. Sort of. The .NET Framework 2.0 already defines methods that are somewhat equivalent to Filter and Map:
Unfortunately, those methods are a bit limited because they only work with arrays and instances of List<T>. The general versions of Filter and Map that we have defined can be used with any IEnumerable<T>—including iterators. In addition, as Jay Wren points out, there really isn't an equivalent to Reduce in the .NET Framework. That makes these definitions of the big three a wise addition to your developer toolbox. In the next article, we'll build on the idea of higher-order functions and show how the big three might be used in even more powerful ways.
Page rendered at Thursday, July 02, 2009 7:44:06 PM (Eastern Standard Time, UTC-05:00)
If you're interested in learning F#, this is the most comprehensive book available. The text is well written and the examples are instructive. And after all, the author is the inventor of F#.
Because this book provides source code in Standard ML, it's a fantastic resource for learning F#. One bit of warning: this book does not teach classic data structures. While structures such as binomial heaps and red-black trees are presented, it is assumed that the reader already knows and understands them.
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.