After a long break, it's time to return to my informal series of articles on
functional programming concepts using only C# 2.0. While C# 3.0 might be syntactically
more suitable to functional-style programming, 2.0 has the advantage of being more
familiar, and in some cases, clearer. In addition, using C# 2.0 makes compiling
sample code under Visual Studio 2005 possible, so an installation of the Orcas beta
(i.e. Visual Studio 2008) is unnecessary.
Because this article builds on information that I have already been presented,
a refresher might be in order. Here are links to help you get your bearings:
-
Fibonacci Numbers, Caching and Closures
-
What's in a Closure?
-
Using Automatic Memoization
Let's get started...
Higher-Order Functions
The notion of higher-order
functions is an amazingly simple (and extraordinarily powerful!) concept that
has an over-complicated computer science term attached to it. The very mention of
the phrase brings to mind a coven of programming warlocks whispering incantations
over their code, but that's not really the case. Simply put, a higher-order function
is a function that takes a function as an argument and/or returns a function as
its result. (Try saying that ten times fast.) Put another way, higher-order functions
are functions that are in the business of treating functions as data. They
are truly the "bread and butter" of functional programming.
In order to support higher-order functions, a language must treat functions as
first-class objects,
but what does that mean? In truth, "first-class object" is a loaded term because
it means different things to different people. For our purposes, consider
Abelson and Sussman:
In general, programming languages impose restrictions on the ways in which computational
elements can be manipulated. Elements with the fewest restrictions are said
to have
first-class status. Some of the 'rights and privileges' of
first-class elements are:
- They may be named by variables.
- They may be passed as arguments to prodedures.
- They may be returned as results from procedures.
- They may be included in data structures. (pg. 76)
In the .NET Framework (and thus, C#), a function become a first-class object
when wrapped in a delegate instance. If a C# function accepts a delegate as one
of its arguments or returns a delegate as its result, it is a higher-order function.
(By this definition, the "Memoize" function that I presented in a
previous article is a higher-order function.)
Why is this useful? Let's start with a simple function that is not
higher-order, which takes an array of
integers and returns a new array containing only the even numbers:
static int[] GetEvens(int[] numbers)
{
List<int> result = new List<int>();
if (numbers != null)
{
foreach (int n in numbers)
{
if ((n % 2) == 0)
result.Add(n);
}
}
return result.ToArray();
}
Now, consider another, slightly more complicated, function that takes an array of integers
and returns a new array containing only the prime numbers:
static int[] GetPrimes(int[] numbers)
{
List<int> result = new List<int>();
if (numbers != null)
{
foreach (int n in numbers)
{
bool isPrime = true;
int max = (n / 2) + 1;
for (int i = 2; i < max; i++)
{
if ((n % i) == 0)
{
isPrime = false;
break;
}
}
if (isPrime)
result.Add(n);
}
}
return result.ToArray();
}
Many of us have written functions like GetEvens and GetPrimes many times—regardless of the fact
that both functions perform precisely the same algorithm. Each function loops through
all elements in an array and tests a predicate to see whether the element
should be added to a result array. Essentially, the algorithm that the
functions perform filters the original array based on some condition. Here is the essence of this filtering algorithm:
static int[] Filter(int[] numbers)
{
List<int> result = new List<int>();
if (numbers != null)
{
foreach (int n in numbers)
{
if (some-condition)
result.Add(n);
}
}
return result.ToArray();
}
It's a small jump to realize that "some-condition" can be represented by another
function passed into the Filter function as a delegate, making Filter a
higher-order function:
delegate bool Predicate(int n);
static int[] Filter(int[] numbers, Predicate predicate)
{
List<int> result = new List<int>();
if (numbers != null)
{
foreach (int n in numbers)
{
if (predicate(n))
result.Add(n);
}
}
return result.ToArray();
}
Now, we can use the Filter function, instead of the GetEvens and GetPrimes
functions, by passing delegates:
static bool IsEven(int value)
{
return (value % 2) == 0;
}
static bool IsPrime(int value)
{
int max = (value / 2) + 1;
for (int i = 2; i < max; i++)
if ((value % i) == 0)
return false;
return true;
}
static void Main()
{
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Console.Write("All:");
foreach (int n in numbers)
Console.Write(" {0}", n);
Console.WriteLine();
Console.Write("Evens:");
foreach (int n in Filter(numbers, IsEven))
Console.Write(" {0}", n);
Console.WriteLine();
Console.Write("Primes:");
foreach (int n in Filter(numbers, IsPrime))
Console.Write(" {0}", n);
Console.WriteLine();
}
The IsEven and IsPrime functions serve as the predicates that we pass into
the Filter function. Note that I could have defined them as anonymous
delegates instead of as separate methods. Often, I will use anonymous delegates
with higher-order functions, but for this sample I think they just confuse the
code. Regardless, we have successfully used a higher-order function to abstract
the essence of the Filter algorithm. Now, we don't have to reinvent the wheel
every time we want to use it.
The Big Three
Filter is one of three well-known higher-order functions that are useful for
manipulating lists or sequences: Filter, Map and Reduce. They should be
immediately familiar to any Lisp or Python programmer (although Guido is
dropping Reduce
in Python 3000). In combination, these functions can be used to solve many computation
problems. As a testimony of their power, consider the Google
MapReduce programming
model used for scalable processing of large datasets. Let's take a look at each
of the big three functions in turn.
Filter
Filter is perhaps the easiest to understand; it
simply takes one sequence and returns
a new sequence that is filtered according to a predicate. This simplicity is why
it served as our initial example. However, our example is pretty limited. A more
robust definition of Filter should be able to consume any type and any sort of
sequence, not just integers and arrays. We can achieve this by defining Filter
as a generic method that processes IEnumerable<T>. Choosing IEnumerable<T>
allows Filter to work with arrays and almost any other sort of data structure in
the .NET Framework.
delegate bool Predicate<TSource>(TSource item);
static IEnumerable<TSource> Filter<TSource>(IEnumerable<TSource> source, Predicate<TSource> predicate)
{
List<TSource> result = new List<TSource>();
if (source != null)
{
foreach (TSource item in source)
{
if (predicate(item))
result.Add(item);
}
}
return result;
}
After making our Filter function generic and modifying it to use
IEnumerable<T>, the initial example code still runs fine. However, we now have a
definition that we can use in almost any situation that calls for filtering.
(Note that we also could have implemented Filter using an iterator to make it
lazy, but that
is a subject for another article.)
Map
Map is
similar to Filter because it takes a sequence and returns another. However, instead of filtering
the items in the sequence with a predicate, Map transforms each item with a conversion
function. Consider this scenario: we want to take a sequence of important dates
(can you tell me what they are?) and
convert them all to text.
static IEnumerable<string> GetPrettyDates(IEnumerable<DateTime> dates)
{
List<string> result = new List<string>();
if (dates != null)
{
foreach (DateTime date in dates)
result.Add(date.ToLongDateString());
}
return result;
}
static void Main()
{
DateTime[] importantDates = {
new DateTime(1995, 8, 24),
new DateTime(1977, 5, 25),
new DateTime(1952, 3, 11)
};
foreach (string text in GetPrettyDates(importantDates))
Console.WriteLine(text);
}
Like we did with Filter, we can extract the essence of the algorithm into a
new function, Map. We do this by passing the piece that changes (the conversion)
into the function as a delegate. The resulting definition of Map is:
delegate TResult Converter<TSource, TResult>(TSource item);
static IEnumerable<TResult> Map<TSource, TResult>(IEnumerable<TSource> source,
Converter<TSource, TResult> converter)
{
List<TResult> result = new List<TResult>();
if (source != null)
{
foreach (TSource item in source)
result.Add(converter(item));
}
return result;
}
This definition is slightly more complicated than Filter because it requires
two generic type parameters: one for the input type and one for the output type.
However, the additional generic type parameter adds tremendous power to the function
because it enables Map to convert a sequence of one type to a sequence of another
type.
Here is our example code modified to use Map:
static string GetPrettyDate(DateTime date)
{
return date.ToLongDateString();
}
static void Main()
{
DateTime[] importantDates = {
new DateTime(1995, 8, 24),
new DateTime(1977, 5, 25),
new DateTime(1952, 3, 11)
};
foreach (string text in Map<DateTime, string>(importantDates, GetPrettyDate))
Console.WriteLine(text);
}
Reduce
The Reduce (or
Fold) function
is the most complicated to define in code, but it is perhaps the simplest of the
big three to understand.
Reduce is all about accumulation. Its purpose is to iterate through a sequence
and build up a result value using each item. A simple example might be a function
that takes a sequence of integers and returns the sum:
static int Sum(IEnumerable<int> numbers)
{
int result = 0;
if (numbers != null)
{
foreach (int n in numbers)
result += n;
}
return result;
}
static void Main()
{
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Console.WriteLine("Sum: {0}", Sum(numbers));
}
Distilling the essence of the Reduce algorithm is a bit more difficult than
Filter and Map. It might not be obvious at first, but there are actually two
pieces that need to be extracted.
static TResult Reduce<TSource, TResult>(IEnumerable<TSource> source)
{
TResult result = start-value;
if (source != null)
{
foreach (TSource item in source)
result = some-accumulation;
}
return result;
}
Making the "start-value" and "some-accumulation" pieces parameters into
Reduce gives
us a working definition.
delegate TResult Accumulator<TSource, TResult>(TSource x, TResult y);
static TResult Reduce<TSource, TResult>(IEnumerable<TSource> source, TResult startValue,
Accumulator<TSource, TResult> accumulator)
{
TResult result = startValue;
if (source != null)
{
foreach (TSource item in source)
result = accumulator(item, result);
}
return result;
}
Now we can rewrite the original summing code to use Reduce like this:
static int Add(int x, int y)
{
return x + y;
}
static void Main()
{
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Console.WriteLine("Sum: {0}", Reduce(numbers, 0, Add));
}
Putting Them Together
Let's combine the big three to compute the sum of the
squares of the even numbers in a sequence of integers. We can do this by using
Filter to find the even numbers, Map to square them and Reduce to produce the
sum.
static bool IsEven(int value)
{
return (value % 2) == 0;
}
static int Square(int value)
{
return value * value;
}
static int Add(int x, int y)
{
return x + y;
}
static void Main()
{
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
IEnumerable<int> evens = Filter(numbers, IsEven);
IEnumerable<int> squares = Map<int, int>(evens, Square);
int sum = Reduce(squares, 0, Add);
Console.WriteLine("Sum: {0}", sum);
}
For those of you who are clamoring for some C# 3.0, we can code the same
calculation using lambda expressions (after making Map, Filter and Reduce into extension
methods) like this:
static void Main()
{
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
int sum = numbers.Filter(x => (x % 2) == 0).Map(x => x * x).Reduce(0, (x, y) => x + y);
Console.WriteLine("Sum: {0}", sum);
}
Reduce is King
Of the big three, Reduce is the most powerful. It
turns out that Filter and Map are really just special cases of Reduce. In fact,
we can implement Filter and Map using only Reduce.
Thinking a bit more abstractly reveals that both
Filter and Map simply accumulate a new sequence from another sequence. With that
in mind, we can use Reduce to perform the accumulation by passing a new list as the start value and an accumulator
delegate responsible for adding items to that list.
static IEnumerable<TSource> Filter<TSource>(IEnumerable<TSource> source, Predicate<TSource> predicate)
{
return Reduce(source, new List<TSource>(),
delegate(TSource item, List<TSource> result)
{
if (predicate(item))
result.Add(item);
return result;
});
}
static IEnumerable<TResult> Map<TSource, TResult>(IEnumerable<TSource> source,
Converter<TSource, TResult> converter)
{
return Reduce(source, new List<TResult>(),
delegate(TSource item, List<TResult> result)
{
result.Add(converter(item));
return result;
});
}
Using anonymous delegates in this case allows us to take advantage
of the
closures produced when
we access "predicate" and "converter" parameters from the
parenting method.
Wrapping It Up
You might be wondering, if the big three functions are so useful, why doesn't
the .NET Framework provide them? Well, the truth is, it does. Sort of. The .NET Framework 2.0 already defines methods that are
somewhat equivalent to Filter
and Map:
- Filter ≈ Array.FindAll<T>() and List<T>.FindAll().
- Map ≈ Array.ConvertAll<TInput, TOutput>() and List<T>.ConvertAll<TOutput>().
Unfortunately, those methods are a bit limited because they only work with
arrays and instances of List<T>. The general versions of Filter and Map that we
have defined can be used with any IEnumerable<T>—including iterators.
In addition, as Jay Wren
points out, there really isn't an equivalent to Reduce in the .NET
Framework. That makes these definitions of the big three a wise addition to your
developer toolbox. In the next article, we'll build on the idea of higher-order
functions and show how the big three might be used in even more powerful ways.