Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Dashboard
Groovy
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Page Layout
No Layout
Two column (simple)
Two column (simple, left sidebar)
Two column (simple, right sidebar)
Three column (simple)
Two column
Two column (left sidebar)
Two column (right sidebar)
Three column
Three column (left and right sidebars)
Undo
Redo
Find/Replace
Keyboard Shortcuts Help
<p>The Fortress programming language has some very powerful features. Guy Steele's slides <a href="http://strangeloop2010.com/talk/presentation_file/14299/GuySteele-parallel.pdf">here</a> (slide 52 onwards) illustrate some of its features including how to split a sentence into words in parallel.</p> <p>Here we look at some ways to use Groovy and GPars to solve that problem.</p> <p>First we can write an imperative sequential version:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> def swords = { s -> def result = [] def word = '' s.each{ ch -> if (ch == ' ') { if (word) result += word word = '' } else word += ch } if (word) result += word result } </pre></td></tr></table> <p>And test it as follows:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> assert swords("Here is a sesquipedalian string of words") == ['Here', 'is', 'a', 'sesquipedalian', 'string', 'of', 'words'] </pre></td></tr></table> <p>Now, we can ask ourselves a question. Is this solution in its current form easy to turn into a parallel solution? The answer is NO. Each step in our loop relies on the result from the previous step. So, if I split the sentence into say two pieces, and I am processing the start of the second piece, I don't know whether a character or space finished the previous piece. I.e. the order in which I process information is important in the current algorithm and to make it work easily in parallel, we need to refactor the algorithm into some equivalent one where the order is no longer important. Technically we require that the algorithm is associative. The figure below shows what we mean.</p> <p><img class="confluence-embedded-image" width="600" src="/download/attachments/230400675/props_associativity.gif?version=1&modificationDate=1368929300166" data-image-src="/download/attachments/230400675/props_associativity.gif?version=1&modificationDate=1368929300166" data-linked-resource-id="230565053" data-linked-resource-type="attachment" data-linked-resource-default-alias="props_associativity.gif" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400675" title="null > props_associativity.gif"></p> <p>If we have three things to calculate, it doesn't matter if we do the first two first or the last two first.</p> <p>Another useful property that would be good for our algorithm to have is left and right identity ("zero") elements.</p> <p><img class="confluence-embedded-image" width="600" src="/download/attachments/230400675/props_identity.gif?version=1&modificationDate=1368929300151" data-image-src="/download/attachments/230400675/props_identity.gif?version=1&modificationDate=1368929300151" data-linked-resource-id="230565052" data-linked-resource-type="attachment" data-linked-resource-default-alias="props_identity.gif" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400675" title="null > props_identity.gif"></p> <p>This property allows us to break our data up into pieces and (if needed) substitute a ZERO element at one or the other end, i.e. our algorithm will play nicely at the boundary points. Basically we can turn this:</p> <p><img class="confluence-embedded-image" width="450" src="/download/attachments/230400675/props_serial.gif?version=1&modificationDate=1368929300201" data-image-src="/download/attachments/230400675/props_serial.gif?version=1&modificationDate=1368929300201" data-linked-resource-id="230565055" data-linked-resource-type="attachment" data-linked-resource-default-alias="props_serial.gif" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400675" title="null > props_serial.gif"></p> <p>into this:</p> <p><img class="confluence-embedded-image" width="700" src="/download/attachments/230400675/props_parallel.gif?version=1&modificationDate=1368929300184" data-image-src="/download/attachments/230400675/props_parallel.gif?version=1&modificationDate=1368929300184" data-linked-resource-id="230565054" data-linked-resource-type="attachment" data-linked-resource-default-alias="props_parallel.gif" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400675" title="null > props_parallel.gif"></p> <p>Technically, combining associativity with left and right identity gives us a monoid. The details aren't important here but basically algorithms with such properties can be easily made parallel. This is exactly what we want when using techniques such as map/reduce or divide and conquer as part of our solution.</p> <p><img class="confluence-embedded-image" width="600" src="/download/attachments/230400675/props_monoid.gif?version=1&modificationDate=1368929300107" data-image-src="/download/attachments/230400675/props_monoid.gif?version=1&modificationDate=1368929300107" data-linked-resource-id="230565050" data-linked-resource-type="attachment" data-linked-resource-default-alias="props_monoid.gif" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400675" title="null > props_monoid.gif"></p> <p>So getting back to word splitting we need to change the algorithm to remember the boundary conditions. To do that we will create a little domain for remembering "chunks" and "segments" of partial results. Here is one way to code that solution (and credit goes to Guy Steele's article for devising this domain):</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> class Util { static maybeWord(s) { s ? [s] : [] } } import static Util.* @Immutable class Chunk { String s public static final ZERO = new Chunk('') def plus(Chunk other) { new Chunk(s + other.s) } def plus(Segment other) { new Segment(s + other.l, other.m, other.r) } def flatten() { maybeWord(s) } } @Immutable class Segment { String l; List m; String r public static final ZERO = new Segment('', [], '') def plus(Chunk other) { new Segment(l, m, r + other.s) } def plus(Segment other) { new Segment(l, m + maybeWord(r + other.l) + other.m, other.r) } def flatten() { maybeWord(l) + m + maybeWord(r) } } </pre></td></tr></table> <p>We'll have more to say about these classes later but basically we have plus ("+") methods for adding <span style=""><code>Segment</code></span>s and <span style=""><code>Chunk</code></span>s, a <span style=""><code>ZERO</code></span> identity element and a <span style=""><code>flatten</code></span> method which takes our <span style=""><code>Chunk</code></span> or <span style=""><code>Segment</code></span> and converts it into a list of words.</p> <p>Guy Steele's slides show using this domain in more detail. Here is a summary snapshot to illustrate the main idea.</p> <p><img class="confluence-embedded-image" width="700" src="/download/attachments/230400675/props_domain.gif?version=1&modificationDate=1368929300135" data-image-src="/download/attachments/230400675/props_domain.gif?version=1&modificationDate=1368929300135" data-linked-resource-id="230565051" data-linked-resource-type="attachment" data-linked-resource-default-alias="props_domain.gif" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400675" title="null > props_domain.gif"></p> <p>Now we are ready to write a slightly more functional flavored sequential solution:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> def processChar(ch) { ch == ' ' ? new Segment('', [], '') : new Chunk(ch) } def swords(s) { s.inject(Chunk.ZERO) { result, ch -> result + processChar(ch) } } </pre></td></tr></table> <p>And test it as follows:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> assert swords("Here is a sesquipedalian string of words").flatten() == ['Here', 'is', 'a', 'sesquipedalian', 'string', 'of', 'words'] </pre></td></tr></table> <p>More importantly, because of the properties of the classes in our domain, we are ready to write some parallel solutions. Before doing that, let's take a slight diversion to illustrate how we might further check some of our algorithm's properties. We'll use the Java port of quickcheck:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> @GrabResolver('http://download.java.net/maven/2') @Grab('net.java:quickcheck:0.5') import static net.java.quickcheck.generator.CombinedGeneratorsIterables.someNonEmptyLists import static net.java.quickcheck.generator.PrimitiveGenerators.nonEmptyStrings def total = 0 someNonEmptyLists(nonEmptyStrings()).each { words -> def someSegment = words.size() > 2 ? new Segment(words[0], words[1..-2], words[-1]) : new Segment('', words, '') def expected = someSegment.flatten() [Chunk.ZERO, Segment.ZERO].each { zero -> assert expected == (someSegment + zero).flatten() assert expected == (zero + someSegment).flatten() } total++ } println "$total tests" </pre></td></tr></table> <p>This produces the output:</p> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> 200 tests </pre></td></tr></table> <p>and gives me confidence that my algorithm is a monoid. The above showed that <span style=""><code>Segment</code></span>s have left and right identity elements for "+" (in fact both <span style=""><code>Check.ZERO</code></span> and <span style=""><code>Segment.ZERO</code></span> work). If we wanted to, we could have additional checks for the associativity piece or for properties of <span style=""><code>Chunk</code></span>.</p> <p>What have we achieved? Well, now we are in a position to write some parallel versions of our algorithm. Many have a common strategy and that is to divide the input into sections. Typically the division stops once the sections reach a certain granularity of size. As a general rule, if we divide past a certain level of granularity, then the overheads associated with setting up the parallelism out weigh the parallelism gains.</p> <p>Here's one version which uses an old school concurrent hash map. This version divides the input sentence into 4 pieces and solves each piece in a separate thread, storing the results into a concurrent hash map.</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> THREADS = 4 def pwords(s) { int n = (s.size() + THREADS - 1) / THREADS def map = new ConcurrentHashMap() (0..<THREADS).collect { i -> Thread.start { def (min, max) = [[s.size(), i * n].min(), [s.size(), (i + 1) * n].min()] map[i] = swords(s[min..<max]) } }*.join() (0..<THREADS).collect { i -> map[i] }.sum().flatten() } </pre></td></tr></table> <p>And test it using this code:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> assert pwords("Here is a sesquipedalian string of words") == ['Here', 'is', 'a', 'sesquipedalian', 'string', 'of', 'words'] </pre></td></tr></table> <p>Alternatively, I can use GPars and not have to deal with as much explicit synchronization.</p> <p>Firstly, we'll look at a map reduce version. It is comprised of a partition part which splits the data up - in our case into 4 pieces. Then the map part runs our serial solution for each piece - notice that the mapping is independent and can be run completely in parallel. For the reduce part we will use our "+" monoid. Here is the code:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> @Grab('org.codehaus.gpars:gpars:0.11') import static groovyx.gpars.GParsPool.withPool THRESHHOLD = 10 def partition(piece) { piece.size() <= THRESHHOLD ? piece : [piece[0..<THRESHHOLD]] + partition(piece.substring(THRESHHOLD)) } def pwords = { input -> withPool(THREADS) { partition(input).parallel.map(swords).reduce{ a, b -> a + b }.flatten() } } </pre></td></tr></table> <p>The previous example used an explicit reduce to make that part of our solution obvious but we can also simplify a bit more using the built-in sum reduction:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> partition(input).parallel.map(swords).sum().flatten() </pre></td></tr></table> <p>As an alternative, we can use the fork/join capabilities which are available in Java 7 (or the JSR166y library - this library is typically bundled with GPars):</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> def pwords(input) { withPool(THREADS) { runForkJoin(0, input.size(), input) { first, last, s -> def size = last - first if (size <= THRESHHOLD) { swords(s[first..<last]) } else { // divide and conquer int pivot = first + size / 2 forkOffChild(first, pivot, s) forkOffChild(pivot, last, s) childrenResults.sum() } }.flatten() } } </pre></td></tr></table> <p>Alternatively, we can use GPars' dataflow capabilities. With dataflow we write expressions which declaratively express the relationships in our data. In our case the pieces must be summed (using our "+" monoid) once they have been calculated:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> import groovyx.gpars.dataflow.DataFlows import static groovyx.gpars.dataflow.DataFlow.task def partition(s, n, i) { s[[s.size(), i * n].min()..<[s.size(), (i + 1) * n].min()] } def pwords(s) { int n = (s.size() + THREADS - 1) / THREADS new DataFlows().with { task { a = swords(partition(s, n, 0)) } task { b = swords(partition(s, n, 1)) } task { c = swords(partition(s, n, 2)) } task { d = swords(partition(s, n, 3)) } task { sum1 = a + b } task { sum2 = c + d } task { sum = sum1 + sum2 } // task { sum = a + b + c + d } // alternative sum }.flatten() } </pre></td></tr></table> <p>Finally, we can create a parallel array version. This version (as currently written) doesn't limit the amount of parallelism to some granularity level, instead it follows the same approach as our functional sequential version but just replaces inject with collectParallel which will automatically perform its steps in parallel. Here is the code:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> def pwords(input) { withPool(THREADS) { input.collectParallel{ processChar(it) }.sumParallel().flatten() } } </pre></td></tr></table> <p>Just as one final example, here is a variation of our original ConcurrentHashMap version but making use of GPars agents to guard a non-concurrent HashMap.</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> def pwords(s) { int n = (s.size() + THREADS - 1) / THREADS def agent = new Agent<Map>([:], {it?.clone()}) (0..<THREADS).collect { i -> Thread.start { def (min, max) = [[s.size(), i * n].min(), [s.size(), (i + 1) * n].min()] def result = swords(s[min..<max]) agent << { it[i] = result; updateValue it } } }*.join() assert agent.val instanceof LinkedHashMap (0..<THREADS).collect { i -> agent.val[i] }.sum().flatten() } </pre></td></tr></table> <p>Here, the agent acts as a synchronization layer between our code and the standard LinkedHashMap (the default map for Groovy). This offers no particular advantage over ConcurrentHashMap but illustrates how agents work in general. We supply the agent with code to run and it runs that code after performing the necessary synchronization.</p>
Please type the word appearing in the picture.
Attachments
Labels
Location
Watch this page
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced