Thursday, June 21, 2007

After a long break, it's time to return to my informal series of articles on functional programming concepts using only C# 2.0. While C# 3.0 might be syntactically more suitable to functional-style programming, 2.0 has the advantage of being more familiar, and in some cases, clearer. In addition, using C# 2.0 makes compiling sample code under Visual Studio 2005 possible, so an installation of the Orcas beta (i.e. Visual Studio 2008) is unnecessary.

Because this article builds on information that I have already been presented, a refresher might be in order. Here are links to help you get your bearings:

  1. Fibonacci Numbers, Caching and Closures
  2. What's in a Closure?
  3. Using Automatic Memoization

Let's get started...

Higher-Order Functions

The notion of higher-order functions is an amazingly simple (and extraordinarily powerful!) concept that has an over-complicated computer science term attached to it. The very mention of the phrase brings to mind a coven of programming warlocks whispering incantations over their code, but that's not really the case. Simply put, a higher-order function is a function that takes a function as an argument and/or returns a function as its result. (Try saying that ten times fast.) Put another way, higher-order functions are functions that are in the business of treating functions as data. They are truly the "bread and butter" of functional programming.

In order to support higher-order functions, a language must treat functions as first-class objects, but what does that mean? In truth, "first-class object" is a loaded term because it means different things to different people. For our purposes, consider Abelson and Sussman:

In general, programming languages impose restrictions on the ways in which computational elements can be manipulated. Elements with the fewest restrictions are said to have first-class status. Some of the 'rights and privileges' of first-class elements are:
  1. They may be named by variables.
  2. They may be passed as arguments to prodedures.
  3. They may be returned as results from procedures.
  4. They may be included in data structures. (pg. 76)

In the .NET Framework (and thus, C#), a function become a first-class object when wrapped in a delegate instance. If a C# function accepts a delegate as one of its arguments or returns a delegate as its result, it is a higher-order function. (By this definition, the "Memoize" function that I presented in a previous article is a higher-order function.)

Why is this useful? Let's start with a simple function that is not higher-order, which takes an array of integers and returns a new array containing only the even numbers:

static int[] GetEvens(int[] numbers)
{
  List<int> result = new List<int>();
  if (numbers != null)
  {
    foreach (int n in numbers)
    {
      if ((n % 2) == 0)
        result.Add(n);
    }
   }

  return
result.ToArray();
}

Now, consider another, slightly more complicated, function that takes an array of integers and returns a new array containing only the prime numbers:

static int[] GetPrimes(int[] numbers)
{
  List<int> result = new List<int>();
  if (numbers != null)
  {
    foreach (int n in numbers)
    {
      bool isPrime = true;
      int max = (n / 2) + 1;
      for (int i = 2; i < max; i++)
      {
        if ((n % i) == 0)
        {
          isPrime = false;
          break;
        }
      }

      if
(isPrime)
        result.Add(n);
    }
  }

  return result.ToArray();
}

Many of us have written functions like GetEvens and GetPrimes many times—regardless of the fact that both functions perform precisely the same algorithm. Each function loops through all elements in an array and tests a predicate to see whether the element should be added to a result array. Essentially, the algorithm that the functions perform filters the original array based on some condition. Here is the essence of this filtering algorithm:

static int[] Filter(int[] numbers)
{
  List<int> result = new List<int>();
  if (numbers != null)
  {
    foreach
(int n in numbers)
    {
      if (some-condition)
        result.Add(n);
    }
  }

  return
result.ToArray();
}

It's a small jump to realize that "some-condition" can be represented by another function passed into the Filter function as a delegate, making Filter a higher-order function:

delegate bool Predicate(int n);

static
 int[] Filter(int[] numbers, Predicate predicate)
{
  List<int> result = new List<int>();
  if (numbers != null)
  {
    foreach (int n in numbers)
    {
      if (predicate(n))
        result.Add(n);
    }
  }

  return
result.ToArray();
}

Now, we can use the Filter function, instead of the GetEvens and GetPrimes functions, by passing delegates:

static bool IsEven(int value)
{
  return (value % 2) == 0;
}

static
 bool IsPrime(int value)
{
  int max = (value / 2) + 1;
  for (int i = 2; i < max; i++)
    if ((value % i) == 0)
      return false;

  return
 true;
}

static
 void Main()
{
  int
[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

  Console
.Write("All:");
  foreach (int n in numbers)
    Console.Write(" {0}", n);
  Console.WriteLine();

  Console
.Write("Evens:");
  foreach (int n in Filter(numbers, IsEven))
    Console.Write(" {0}", n);
  Console.WriteLine();

  Console
.Write("Primes:");
  foreach (int n in Filter(numbers, IsPrime))
    Console.Write(" {0}", n);
  Console.WriteLine();
}

The IsEven and IsPrime functions serve as the predicates that we pass into the Filter function. Note that I could have defined them as anonymous delegates instead of as separate methods. Often, I will use anonymous delegates with higher-order functions, but for this sample I think they just confuse the code. Regardless, we have successfully used a higher-order function to abstract the essence of the Filter algorithm. Now, we don't have to reinvent the wheel every time we want to use it.

The Big Three

Filter is one of three well-known higher-order functions that are useful for manipulating lists or sequences: Filter, Map and Reduce. They should be immediately familiar to any Lisp or Python programmer (although Guido is dropping Reduce in Python 3000). In combination, these functions can be used to solve many computation problems. As a testimony of their power, consider the Google MapReduce programming model used for scalable processing of large datasets. Let's take a look at each of the big three functions in turn.

Filter

Filter is perhaps the easiest to understand; it simply takes one sequence and returns a new sequence that is filtered according to a predicate. This simplicity is why it served as our initial example. However, our example is pretty limited. A more robust definition of Filter should be able to consume any type and any sort of sequence, not just integers and arrays. We can achieve this by defining Filter as a generic method that processes IEnumerable<T>. Choosing IEnumerable<T> allows Filter to work with arrays and almost any other sort of data structure in the .NET Framework.

delegate bool Predicate<TSource>(TSource item);

static
 IEnumerable<TSource> Filter<TSource>(IEnumerable<TSource> source, Predicate<TSource> predicate)
{
  List<TSource> result = new List<TSource>();
  if (source != null)
  {
    foreach (TSource item in source)
    {
      if (predicate(item))
        result.Add(item);
    }
  }

  return
result;
}

After making our Filter function generic and modifying it to use IEnumerable<T>, the initial example code still runs fine. However, we now have a definition that we can use in almost any situation that calls for filtering. (Note that we also could have implemented Filter using an iterator to make it lazy, but that is a subject for another article.)

Map

Map is similar to Filter because it takes a sequence and returns another. However, instead of filtering the items in the sequence with a predicate, Map transforms each item with a conversion function. Consider this scenario: we want to take a sequence of important dates (can you tell me what they are?) and convert them all to text.

static IEnumerable<string> GetPrettyDates(IEnumerable<DateTime> dates)
{
  List<string> result = new List<string>();
  if (dates != null)
  {
    foreach (DateTime date in dates)
      result.Add(date.ToLongDateString());
  }

  return
result;
}

static
 void Main()
{
  DateTime[] importantDates = {
    new DateTime(1995, 8, 24),
    new DateTime(1977, 5, 25),
    new DateTime(1952, 3, 11)
  };

  foreach
(string text in GetPrettyDates(importantDates))
    Console.WriteLine(text);
}

Like we did with Filter, we can extract the essence of the algorithm into a new function, Map. We do this by passing the piece that changes (the conversion) into the function as a delegate. The resulting definition of Map is:

delegate TResult Converter<TSource, TResult>(TSource item);

static
 IEnumerable<TResult> Map<TSource, TResult>(IEnumerable<TSource> source,
  Converter<TSource, TResult> converter)
{
  List<TResult> result = new List<TResult>();
  if (source != null)
  {
    foreach (TSource item in source)
      result.Add(converter(item));
  }

  return
result;
}

This definition is slightly more complicated than Filter because it requires two generic type parameters: one for the input type and one for the output type. However, the additional generic type parameter adds tremendous power to the function because it enables Map to convert a sequence of one type to a sequence of another type.

Here is our example code modified to use Map:

static string GetPrettyDate(DateTime date)
{
  return date.ToLongDateString();
}

static
 void Main()
{
  DateTime[] importantDates = {
    new DateTime(1995, 8, 24),
    new DateTime(1977, 5, 25),
    new DateTime(1952, 3, 11)
  };

  foreach
(string text in Map<DateTime, string>(importantDates, GetPrettyDate))
    Console.WriteLine(text);
}

Reduce

The Reduce (or Fold) function is the most complicated to define in code, but it is perhaps the simplest of the big three to understand. Reduce is all about accumulation. Its purpose is to iterate through a sequence and build up a result value using each item. A simple example might be a function that takes a sequence of integers and returns the sum:

static int Sum(IEnumerable<int> numbers)
{
  int result = 0;
  if (numbers != null)
  {
    foreach (int n in numbers)
      result += n;
  }

  return
result;
}

static
 void Main()
{
  int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

  Console
.WriteLine("Sum: {0}", Sum(numbers));
}

Distilling the essence of the Reduce algorithm is a bit more difficult than Filter and Map. It might not be obvious at first, but there are actually two pieces that need to be extracted.

static TResult Reduce<TSource, TResult>(IEnumerable<TSource> source)
{
  TResult result = start-value;
  if (source != null)
  {
    foreach (TSource item in source)
      result = some-accumulation;
  }

  return result;
}

Making the "start-value" and "some-accumulation" pieces parameters into Reduce gives us a working definition.

delegate TResult Accumulator<TSource, TResult>(TSource x, TResult y);

static
TResult Reduce<TSource, TResult>(IEnumerable<TSource> source, TResult startValue,
  Accumulator<TSource, TResult> accumulator)
{
  TResult result = startValue;
  if (source != null)
  {
    foreach (TSource item in source)
      result = accumulator(item, result);
  }

  return result;
}

Now we can rewrite the original summing code to use Reduce like this:

static int Add(int x, int y)
{
  return x + y;
}

static
 void Main()
{
  int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

  Console
.WriteLine("Sum: {0}", Reduce(numbers, 0, Add));
}

Putting Them Together

Let's combine the big three to compute the sum of the squares of the even numbers in a sequence of integers. We can do this by using Filter to find the even numbers, Map to square them and Reduce to produce the sum.

static bool IsEven(int value)
{
  return (value % 2) == 0;
}
static int Square(int value)
{
  return value * value;
}
static int Add(int x, int y)
{
  return x + y;
}

static
 void Main()
{
  int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
  IEnumerable<int> evens = Filter(numbers, IsEven);
  IEnumerable<int> squares = Map<int, int>(evens, Square);
  int sum = Reduce(squares, 0, Add);

  Console
.WriteLine("Sum: {0}", sum);
}

For those of you who are clamoring for some C# 3.0, we can code the same calculation using lambda expressions (after making Map, Filter and Reduce into extension methods) like this:

static void Main()
{
  int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

  int sum = numbers.Filter(x => (x % 2) == 0).Map(x => x * x).Reduce(0, (x, y) => x + y);

  Console
.WriteLine("Sum: {0}", sum);
}

Reduce is King

Of the big three, Reduce is the most powerful. It turns out that Filter and Map are really just special cases of Reduce. In fact, we can implement Filter and Map using only Reduce.

Thinking a bit more abstractly reveals that both Filter and Map simply accumulate a new sequence from another sequence. With that in mind, we can use Reduce to perform the accumulation by passing a new list as the start value and an accumulator delegate responsible for adding items to that list.

static IEnumerable<TSource> Filter<TSource>(IEnumerable<TSource> source, Predicate<TSource> predicate)
{
  return Reduce(source, new List<TSource>(),
    delegate(TSource item, List<TSource> result)
    {
      if (predicate(item))
        result.Add(item);

      return result;
    });
}

static
 IEnumerable<TResult> Map<TSource, TResult>(IEnumerable<TSource> source,
  Converter<TSource, TResult> converter)
{
  return Reduce(source, new List<TResult>(),
    delegate
(TSource item, List<TResult> result)
    {
      result.Add(converter(item));

      return
result;
    });
}

Using anonymous delegates in this case allows us to take advantage of the closures produced when we access "predicate" and "converter" parameters from the parenting method.

Wrapping It Up

You might be wondering, if the big three functions are so useful, why doesn't the .NET Framework provide them? Well, the truth is, it does. Sort of. The .NET Framework 2.0 already defines methods that are somewhat equivalent to Filter and Map:

  • Filter ≈ Array.FindAll<T>() and List<T>.FindAll().
  • Map ≈ Array.ConvertAll<TInput, TOutput>() and List<T>.ConvertAll<TOutput>().

Unfortunately, those methods are a bit limited because they only work with arrays and instances of List<T>. The general versions of Filter and Map that we have defined can be used with any IEnumerable<T>—including iterators. In addition, as Jay Wren points out, there really isn't an equivalent to Reduce in the .NET Framework. That makes these definitions of the big three a wise addition to your developer toolbox. In the next article, we'll build on the idea of higher-order functions and show how the big three might be used in even more powerful ways.

posted on Thursday, June 21, 2007 8:37:03 AM (Pacific Standard Time, UTC-08:00)  #    Comments [4]

kick it on DotNetKicks.com