Monday, March 10, 2008

Inside Functional Programming

Inside Functional Programming

Take advantage of functional programming techniques like Filter, Map, and Reduce in your day-to-day business apps.

TECHNOLOGY TOOLBOX: C#, SQL Server 2005 Compact Edition Runtime, Visual Studio 2005 Standard Edition SP1 or Higher, SQL Server Management Studio [Express] SP2

As you start using C# 3.0, you'll find yourself diving deeper into the concepts created for functional programming. This is the academic research done for languages like LISP, ML, Haskell, and others that have only a small installed base in professional programming environments. In a way, that's too bad, because many of these concepts provide elegant solutions to many of the


The incorporation of these functional programming techniques into .NET is one of the reasons why I'm excited about the release of C# 3.0 and Visual Studio (VS) 2008. You can use these same concepts in your favorite language. That's important for a couple of reasons. First, you're more familiar with the syntax of your preferred language and that makes it much easier to continue being productive. Second, you can mix these functional programming concepts alongside more traditional imperative algorithms.

Of course, you also give something up by staying in your familiar environment. Doing things the way you're accustomed to doing them often means that you're slow to try and adapt new techniques. Other times, you might not get the full advantage of a given technique because you're using it in your familiar context, and that context doesn't take full advantage of the technique.

This article will familiarize you with three of the most common functional programming elements: the higher order functions Filter, Map, and Reduce. You're probably already familiar with the general concepts--if not the specific terms--so much of the research associated with functional programming will be fairly accessible.

Define a Value's Removal
You've already used the concept of Filter, even in C# 2.0. List contains a RemoveAll() method. RemoveAll takes a delegate and that delegate determines which values should be removed from your collection. For example, this code removes all integers from the someNumbers collections that are divisible by 3:

­

List someNumbers = new List
{ 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };

someNumbers.RemoveAll(
delegate(int num)
{ return num % 3 == 0; });

C# 3.0 provides a more concise way to express that same concept:

someNumbers.RemoveAll(num => num % 3 == 0);

That's a filter. The filter defines when to remove a value. Let's take a small detour into vocabulary land. A Higher-Order function is simply a function that takes a function as a parameter, or returns a function, or both. Both of these samples fit that description. In both cases, the parameter to RemoveAll() is the function that describes what members should be removed from the set. Internally, the RemoveAll() method calls your function once on every item in the sequence. When there's a match, that item gets removed.

In C# 3.0 and Language Integrated Query (LINQ) syntax, the Where clause defines the filter. In the case of Where, the filter expression might not be evaluated as a delegate. LINQ to SQL processes the expression tree representation of your query. By examining the expression tree, LINQ to SQL can create a T-SQL representation of your query and execute the query using the database engine, rather than invoking the delegate. Any provider that implements IQueryable or IQueryable will parse the expression tree and translate it into the best format for the provider.

A filter is the simplest form of a higher order function. Its input is a sequence, and its output is a proper subset of the input sequence. The concept is already familiar to you, and it shows the fundamental concept of passing a function as a parameter to another function.

C# 2.0 and the corresponding methods in the .NET framework did not fully embrace the concepts of functional programming. You can see that in the way RemoveAll is implemented. It's a member of the List class, and it modifies that object. A true Filter takes its input sequence as a parameter and returns the output sequence; it doesn't modify the state of its input sequence.

Return a New Sequence
Map is the second major building block you'll see in functional programming. Map returns a new sequence computed from an input sequence. Similar to Filter, Map takes a function as one of its parameters. That function transforms a single input element into the corresponding output element.

As with Filter, there's similar functionality in the .NET base library. List.ConvertAll produces a new list of elements using a delegate you define that transforms a single input element into the corresponding output element. Here, the conversion computes the square of every number:

List someNumbers = new List
{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
List squares = someNumbers.ConvertAll(
delegate(int num)
{
return num * num;
});
squares.ForEach(delegate(int num)
{ Console.WriteLine(num); });

In C# 3.0, lambda syntax makes this more concise:

List squares =
someNumbers.ConvertAll(num => num * num);

Filter is built in to the query syntax added in C# 3.0:

IEnumerable squares = from num in someNumbers
select num * num;

Of course, you probably noticed that quick change in the last code snippet. The last version returned an IEnumerable rather than a List. The C# 3.0 versions of these methods operate on sequences, and aren't members of any particular type.

There's nothing that says the output sequence has to be of the same type as the input sequence. This method returns a list of formatted strings computed from a set of input numbers:

List someNumbers =
new List
{ 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15 };
List formattedNumbers =
someNumbers.ConvertAll(
delegate(int num)
{
return string.Format("{0:D6}", num);
});
formattedNumbers.ForEach(
delegate(string num)
{ Console.WriteLine(num); });

Of course, the same method gets simplified using C# 3.0:

List formattedNumbers =
someNumbers.ConvertAll
(num => string.Format("{0:D6}", num));
And can be further simplified using the query syntax:

IEnumerable formattedNumbers =
from num in someNumbers
select string.Format("{0:D6}", num);

As with Filter, you've used functionality like Map before. You might not have known what it was called, or where its computer science roots lie. Filter is nothing more than a convention where you write a method that converts one sequence type into another, and the specifics of that conversion are coded into a second method. That second method is then passed on as a parameter to the Map function.

One Function to Rule Them All: Reduce
The most powerful of the three concepts I'm covering this month is Reduce. (You'll also find some references that use the term "Fold.") Reduce returns a single scalar value that's computed by visiting all the members of a sequence. Reduce is one of those concepts that is much simpler once you see some examples.

This simple code snippet computes the sum of all values in the sequence:

List someNumbers = new List
{ 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15 };
int sum = 0;
foreach (int num in someNumbers)
sum += num;
Console.WriteLine(sum);

This is simple stuff that you've written many times. The problem is that you can't reuse any of it anywhere. Also, many other examples will likely contain more complicated code inside the loop. So smart computer science wizards decided to take on this problem and create a way to pass along that inner code as a parameter to a generic method. In C# 3.0, the answer is the Aggregate extension method. Aggregate has a few overloads. This example uses the simplest form:

List someNumbers = new List
{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15 };
int sum = someNumbers.Aggregate(
delegate(int currentSum, int num)
{
return currentSum + num;
});
Console.WriteLine(sum);

The delegate continues to produce a running sum from the current value in the sequence, as well as the total accumulated so far. There are two other overloads of Aggregate. One takes a seed value:

int sum = someNumbers.Aggregate(0,
delegate(int currentSum, int num)
{
return currentSum + num;
});

The final overload allows you to specify a different return type. Suppose you wanted to build a comma-separated string of all the values. You'd use the third version of Aggregate:

List someNumbers = new List
{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15 };
string formattedValues =
someNumbers.Aggregate(null,
delegate(string currentOutput, int num)
{
if (currentOutput != null)
return string.Format("{0}, {1}",
currentOutput, num);
else
return num.ToString();
});
Console.WriteLine(formattedValues);

Of course, all of these can be rewritten using lambda syntax:

int sum = someNumbers.Aggregate(
0, (currentSum, num)
=> currentSum + num);
// Or:
string formattedValues = someNumbers.Aggregate("",
(currentOutput, num)
=> (currentOutput != null) ?
string.Format("{0}, {1}", currentOutput, num) :
num.ToString()
);

The second example converts the numbers to a list of strings, and it's a bit more complicated code. But it's all stuff you've seen before. The second example uses only the ternary operator to do the test. If this makes you uncomfortable, you can use the imperative syntax with lambda expressions:

string formattedValues =
someNumbers.Aggregate(
"", (currentOutput, num)
=>
{
if (currentOutput != null)
return string.Format("{0}, {1}",
currentOutput, num);
else
return num.ToString();
});

Earlier, I paraphrased Tolkien, and called Reduce the one function to rule them all. From a computer science perspective, Filter and Map are nothing more than special cases of Reduce. If you define a Reduce method where the return value type is a sequence, you can implement Map and Filter using Reduce.

However, most libraries don't work that way because Map and Filter can perform much better if they don't share code with Reduce. And the Filter and Map prototypes are quite a bit simpler to understand.

This column contained some low-level concepts that will help you understand the computer science upon which C# 3.0, LINQ, and much of the .NET 3.5 Framework were built. It's all stuff you've seen before, and it's not that difficult. It's just that they come with new twists and more concise code around what you've already been doing.

No comments: