Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Dashboard
Maven User
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Page Layout
No Layout
Two column (simple)
Two column (simple, left sidebar)
Two column (simple, right sidebar)
Three column (simple)
Two column
Two column (left sidebar)
Two column (right sidebar)
Three column
Three column (left and right sidebars)
Undo
Redo
Find/Replace
Keyboard Shortcuts Help
<h2>The characteristics of concurrent multi-module builds</h2> <p>The following article tries to discuss the effects of various concurrency scheduling options, seen in context of an<br /> actual build. For the sake of disussion in this article, the available concurrency always matches or exceeds the<br /> amount of schedulable work, if you're saturating your CPU then there's litte more to be gained, performance wise.<br /> All measurements done in this article is also based on ramdisk (tmpfs) based, models, meaning that IO in general is<br /> taken out of the equation.</p> <h2>Legend</h2> <p>The following abbreviations are used:</p> <table class="confluenceTable"><tbody> <tr> <th class="confluenceTh"><p>Letter</p></th> <th class="confluenceTh"><p>Phase</p></th> </tr> <tr> <td class="confluenceTd"><p>C</p></td> <td class="confluenceTd"><p>compile</p></td> </tr> <tr> <td class="confluenceTd"><p>TC</p></td> <td class="confluenceTd"><p>Test compile</p></td> </tr> <tr> <td class="confluenceTd"><p>S</p></td> <td class="confluenceTd"><p>surefire</p></td> </tr> <tr> <td class="confluenceTd"><p>J</p></td> <td class="confluenceTd"><p>jar/install</p></td> </tr> <tr> <td class="confluenceTd"><p>X</p></td> <td class="confluenceTd"><p>blocked/unrunnable</p></td> </tr> </tbody></table> <p>This analysis tries to explain concurrency options seen in the context of one particular build. While this is definitely<br /> non-exhaustive, it suffices to illustrate the challenges/restriction encountered in making this /one/ build run optimally.</p> <p>The module dependency graph is as follows</p> <p><img class="confluence-embedded-image" src="/download/attachments/230400682/dependencygraph.jpg?version=1&modificationDate=1368939520479" data-image-src="/download/attachments/230400682/dependencygraph.jpg?version=1&modificationDate=1368939520479" data-linked-resource-id="230565083" data-linked-resource-type="attachment" data-linked-resource-default-alias="dependencygraph.jpg" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400682" title="null > dependencygraph.jpg"><br /> <strong>Figure 1: Dependency graph of project</strong></p> <h3>My average project</h3> <p>I timed the actual phases of some other builds (my 2 test projects). These are fairly standard maven projects<br /> with lots of code and decent test coverage.</p> <p>The interesting thing (not shown) is that the average time for the different lifecycle phases in a multi-module build<br /> did not vary much. Without loosing too much accuracy I could define an "average" module in my multi-module build. For mvn -o clean install my "average" module in my project spends</p> <ul> <li>30% compiling</li> <li>17% test-compiling</li> <li>49% in surefire</li> <li>4% in jar/install</li> </ul> <p>That means there's less than the rounding error (<1% left for all the other stuff).</p> <h1>The run-time view</h1> <p>To make things more interesting, I've transposed the (real) numbers from "my average project" onto the "imaginary" dependency graph seen in figure 1, to<br /> better understand what is happening. The figure has "time" along the X-axis, and shows the different modules along the Y axis.</p> <p><img class="confluence-embedded-image" src="/download/attachments/230400682/figure2.jpg?version=1&modificationDate=1368939520465" data-image-src="/download/attachments/230400682/figure2.jpg?version=1&modificationDate=1368939520465" data-linked-resource-id="230565082" data-linked-resource-type="attachment" data-linked-resource-default-alias="figure2.jpg" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400682" title="null > figure2.jpg"><br /> <strong>Figure 2: Weave-mode run-time scheduling of modules in the average build, time along X axis</strong></p> <p>The interesting bit about this is that minor variations in the individual modules have little impact on the end-result:<br /> the figures are to-scale so if you can keep them visible at the same time you'll see the (lack of) difference.</p> <p><img class="confluence-embedded-image" src="/download/attachments/230400682/figure3.jpg?version=1&modificationDate=1368939520456" data-image-src="/download/attachments/230400682/figure3.jpg?version=1&modificationDate=1368939520456" data-linked-resource-id="230565081" data-linked-resource-type="attachment" data-linked-resource-default-alias="figure3.jpg" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400682" title="null > figure3.jpg"><br /> <strong>Figure 3: Module E changes characteristics (becomes shorter than average), module Z follows scheduling</strong></p> <p>It's possible to draw a large number of graphs that have significant changes in invidvidual modules but no<br /> change on end-outcome.</p> <h1>The runtime-leaf-module issue</h1> <p>There's a given set of modules that are <em>reactor-leaf-modules</em> in the reactor dependency<br /> tree (Y and Z in this case). There is an additional set of <em>runtime-leaf-modules</em> that constitute the<br /> "last modules to reach package/install" in a concurrent build. If we assume that jar/install is mostly at very small<br /> phase at the end, we see that the race is all about reaching the packaging phase (between S and J in the figures <img class="emoticon emoticon-wink" data-emoticon-name="wink" border="0" src="/s/en_GB/3278/15/_/images/icons/emoticons/wink.png" alt="(wink)" title="(wink)" />)</p> <p>Notable special cases:</p> <ul> <li>Forked executions can extend the reactor-leaf modules, altough this probably not relevant</li> <li>The war plugin is often quite heavy</li> <li>Integrations tests are not part of this equation. It is my impression that a lot of projects keep all integration tests in a single module, which basically means we're not going to be able to do anything for them</li> </ul> <p><img class="confluence-embedded-image" src="/download/attachments/230400682/figure4.jpg?version=1&modificationDate=1368939520441" data-image-src="/download/attachments/230400682/figure4.jpg?version=1&modificationDate=1368939520441" data-linked-resource-id="230565080" data-linked-resource-type="attachment" data-linked-resource-default-alias="figure4.jpg" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400682" title="null > figure4.jpg"><br /> <strong>Figure 4: Same graph as figure 3, but with critical path runtime-leaf-module shown with red line</strong></p> <p>The graph shows the "critical path" in this build. Although it cannot be known up-front it will in effect always limit the total-time <br /> spent building this project.</p> <ul> <li>The red circle marks the "hard floor" of the concurrency potential. The dependency-ordered compile outputs is the single strongest force controlling timing of the build.</li> <li>It is arguable that this "hard floor" should be moved to after test-compile too.</li> <li>The test-compile phase has an inherent dependency on the "compile" phase of the same module. It is possible to see a "TestCompile" dependency graph that expresses the test-compile dependencies (allowing test-compile to run constrained only by test-jar depenedencies). But any performance improvement will only be gained if this can give improved scheduling along the critical path, and as such it will only provide an overall performance improvement equalling gains along the critical path). In other words, this scheduling would probably only save a few 100 ms for most projects.</li> </ul> <p>Given this understanding, one could be tempted to look at a few other scenarios:</p> <ul> <li>Information could be profiled from previous runs that could be used to affect priorities in subsequent builds</li> <li>Before reaching the hard-floor, it's all about prioritizing the resources to get there. For all but the first module there's usually a lot of other runnable tasks all along this path</li> <li>After reaching the "hard floor", the issue is mostly about prioritizing available resources to focus on<br /> reaching the packaging phase with all runtime-leaf-modules as quickly as possible ( all for one, one for all).</li> </ul> <p>So it'd be possible to consider cross-module prioritization of threads/scheduling of tasks</p> <h1>Number of schedulable tasks</h1> <p><img class="confluence-embedded-image" src="/download/attachments/230400682/figure5.jpg?version=1&modificationDate=1368939520426" data-image-src="/download/attachments/230400682/figure5.jpg?version=1&modificationDate=1368939520426" data-linked-resource-id="230565079" data-linked-resource-type="attachment" data-linked-resource-default-alias="figure5.jpg" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400682" title="null > figure5.jpg"><br /> <strong>Figure 5: Number of schedulable tasks</strong></p> <h1>Variations</h1> <p><img class="confluence-embedded-image" src="/download/attachments/230400682/figure6.jpg?version=1&modificationDate=1368939520402" data-image-src="/download/attachments/230400682/figure6.jpg?version=1&modificationDate=1368939520402" data-linked-resource-id="230565078" data-linked-resource-type="attachment" data-linked-resource-default-alias="figure6.jpg" data-base-url="http://docs.codehaus.org" data-linked-resource-container-id="230400682" title="null > figure6.jpg"><br /> <strong>Figure 6: first module in reactor dependency is critical path of execution</strong></p> <p>In this scenenario, the unit tests in the first module take a long time to complete.</p> <h1>What does it mean ?</h1> <ul> <li>Overall concurrency is governed by reactor dependency graph</li> <li>There's little/no point in trying to schedule "things" our of order. We can just let everything from package<br /> onwards respect reactor dependency graph totally</li> <li>There's a number of crazy strategies I tried out, some of which I communicated to the dev list:<br /> Most crazy optimizations only have any real use case if they're along the critical path, and then the effect is quite limited,<br /> unless the optimization can affect /all/ of the potential critical paths that may arise, and even then it will be limited by the<br /> number of runtime-leaf-modules.</li> <li>Profile information from previous runs could be used to influence priorities, but it looks mostly like it'd be<br /> maven telling different surefire-modules how much resources they can consume to reach the overall goal</li> </ul>
Please type the word appearing in the picture.
Attachments
Labels
Location
Watch this page
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced