The Fortress programming language has some very powerful features. Guy Steele's slides here (slides 52 onwards) illustrate some of its features including how to split a sentence into words in parallel.
Here we look at some ways to use Groovy and GPars to solve that problem.
First we can write an imperative sequential version:
And test it as follows:
Now, we can ask ourselves a question. Is this solution in its current form easy to turn into a parallel solution? The answer is NO. Each step in our loop relies on the result from the previous step. So, if I split the sentence into say two pieces, and I am processing the start of the second piece, I don't know whether a character or space finished the previous piece. I.e. the order in which I process information is important in the current algorithm and to make it work easily in parallel, we need to refactor the algorithm into some equivalent where the order is no longer important. Technically we require that the algorithm is associative. The figure below shows what we mean.
(If we have three things to calculate, it doesn't matter if we do the first two first or the last two first.)
Another useful property that would be good for our algorithm to have is left and right identity ("zero") elements.
This property allows us to run our algorithm on boundary points. Basically we can turn this:
Technically, I have created a monoid. But that's not important.
So to do that (credit to Guy's article) we need to change the algorithm to remember the boundary conditions. To do that we will create a little domain for remembering "chunks" and "segments" of partial results. Here is one way to code that solution:
Now we can write a slightly more functional flavored sequential solution:
And if we want we can check our associative properties:
This produces the output:
and gives me confidence that my algorithm is a monoid. The above showed that
Segment's have left and right identity elements (in fact both
Segment.ZERO work). We could have additional checks for the associativity piece or for properties of
Now we can write a parallel version. This version divides the input sentence into 4 pieces and solves each piece in a separate thread, storing the results into a concurrent hash map.
And test it using this code:
Alternatively, I can use GPars.
Here is a map reduce version:
And a fork/join version:
And a dataflow version:
And a parallel array version: