The Fortress programming language has some very powerful features. Guy Steele's slides here (slide 52 onwards) illustrate some of its features including how to split a sentence into words in parallel.

Here we look at some ways to use Groovy and GPars to solve that problem.

First we can write an imperative sequential version:

And test it as follows:

Now, we can ask ourselves a question. Is this solution in its current form easy to turn into a parallel solution? The answer is NO. Each step in our loop relies on the result from the previous step. So, if I split the sentence into say two pieces, and I am processing the start of the second piece, I don't know whether a character or space finished the previous piece. I.e. the order in which I process information is important in the current algorithm and to make it work easily in parallel, we need to refactor the algorithm into some equivalent one where the order is no longer important. Technically we require that the algorithm is associative. The figure below shows what we mean.

(If we have three things to calculate, it doesn't matter if we do the first two first or the last two first.)

Another useful property that would be good for our algorithm to have is left and right identity ("zero") elements.

This property allows us to run our data up into pieces and our algorithm will play nicely at the boundary points. Basically we can turn this:

into this:

Technically, combining associativity with left and right identity gives us a monoid. The details aren't important here but basically algorithms with such properties can be easily made parallel. This is exactly what we want when using techniques such as map/reduce or divide and conquer as part of our solution.

So getting back to word splitting we need to change the algorithm to remember the boundary conditions. To do that we will create a little domain for remembering "chunks" and "segments" of partial results. Here is one way to code that solution (and credit goes to Guy Steele's article for devising this domain):

(We'll have more to say about these classes later but basically we have plus ("+") methods for adding `Segment`

s and `Chunk`

s, a `ZERO`

identity element and a `flatten`

method which takes our `Chunk`

or `Segment`

and converts it into a list of words.)

Guy's slides show using this domain in more detail. Here is just one excerpt from those slides to illustrate the main idea.

Now we are ready to write a slightly more functional flavored sequential solution:

And test it as follows:

More importantly, because of the properties of the classes in our domain, we are ready to write some parallel solutions. Before doing that, let's take a slight diversion to illustrate how we might further check some of our algorithm's properties. We'll use the Java port of quickcheck:

This produces the output:

200 tests

and gives me confidence that my algorithm is a monoid. The above showed that `Segment`

's have left and right identity elements for "+" (in fact both `Check.ZERO`

and `Segment.ZERO`

work). If we wanted to, we could have additional checks for the associativity piece or for properties of `Chunk`

.

What have we achieved? Well, now we are in a position to write some parallel versions of our algorithm. Many have a common strategy and that is to divide the input into sections. Typically the division stops once the sections reach a certain granularity of size. As a general rule, if we divide past a certain level of granularity, then the overheads associated with setting up the parallelism out weigh the parallelism gains.

Here's one version which uses an old school concurrent hash map. This version divides the input sentence into 4 pieces and solves each piece in a separate thread, storing the results into a concurrent hash map.

And test it using this code:

Alternatively, I can use GPars and not have to deal with as much explicit synchronization.

Firstly, we'll look at a map reduce version. It is comprised of a partition part which splits the data up - in our case into 4 pieces. Then the map part runs our serial solution for each piece - notice that the mapping is independent and can be run completely in parallel. For the reduce part we will use our "+" monoid. Here is the code:

The previous example used an explicit reduce to make that part of our solution obvious but we can also simplify a bit more using the built-in sum reduction:

As an alternative, we can use the fork/join capabilities which are available in Java 7 (or the JSR166y library - this library is typically bundled with GPars):

Alternatively, we can use GPars' dataflow capabilities. With dataflow we write expressions which declaratively express the relationships in our data. In our case the pieces must be summed (using our "+" monoid) once they have been calculated:

Finally, we can create a parallel array version. This version (as currently written) doesn't limit the amount of parallelism to some granularity level, instead it follows the same approach as our functional sequential version but just replaces inject with collectParallel which will automatically perform its steps in parallel. Here is the code: