The following abbreviations are used:
This analysis tries to explain concurrency options seen in the context of one particular build. While this is definitely
non-exhaustive, it suffices to illustrate the challenges/restriction encountered in making this /one/ build run optimally.
The module dependency graph is as follows<br/>follows
<div>Figure Figure 1: Dependency graph of project</div>
My average project
I timed the actual phases of some other builds (my 2 test projects). These are fairly standard maven projects
with lots of code and decent test coverage.<br/>
The interesting thing (not shown) is that the average time for the different lifecycle phases in a multi-module build
did not vary much. Without loosing too much accuracy I could define an "average" module in my multi-module build. For mvn -o clean install my "average" module in my project spends
That means there's less than the rounding error (<1% left for all the other stuff).<br/>
The run-time view
To make things more interesting, I've transposed the (real) numbers from "my average project" onto the "imaginary" dependency graph seen in figure 1, to
better understand what is happening. The figure has "time" along the X-axis, and shows the different modules along the Y axis.
<div>Figure Figure 2: Weave-mode run-time scheduling of modules in the average build, time along X axis</div>
The interesting bit about this is that minor variations in the individual modules have little impact on the end-result:
the figures are to-scale so if you can keep them visible at the same time you'll see the (lack of) difference.<br/> <div class="image">
<div>Figure Figure 3: Module E changes characteristics (becomes shorter than average), module Z follows scheduling</div>
It's possible to draw a large number of graphs that have significant changes in invidvidual modules but no
change on end-outcome.
There's a given set of modules that are <em>reactorreactor-leaf-modules</em> modules in the reactor dependency
tree (Y and Z in this case). There is an additional set of <em>runtimeruntime-leaf-modules</em> modules that constitute the
"last modules to reach package/install" in a concurrent build. If we assume that jar/install is mostly at very small
phase at the end, we see that the race is all about reaching the packaging phase (between S and J in the figures <br/>)
Notable special cases:
- Forked executions can extend the reactor-leaf modules, altough this probably not relevant
- The war plugin is often quite heavy
- Integrations tests are not part of this equation. It is my impression that a lot of projects keep all integration tests in a single module,
which basically means we're not going to be able to do anything for them<
<div>Figure Figure 4: Same graph as figure 3, but with critical path runtime-leaf-module shown with red line</div>
The graph shows the "critical path" in this build. Although it cannot be known up-front it will in effect always limit the total-time
spent building this project.<li>The
- The red circle marks the "hard floor" of the concurrency potential. The dependency-ordered compile outputs is
- the single strongest force controlling timing of the build.
- It is arguable that this "hard floor" should be moved to after test-compile too.
- The test-compile phase has an inherent dependency on the "compile" phase of the same module. It is possible to see a
- "TestCompile" dependency graph that expresses the test-compile dependencies (allowing test-compile to run constrained
- only by test-jar depenedencies). But any performance improvement will only be gained if this can give improved
- scheduling along the critical path, and as such it will only provide an overall performance improvement equalling
- gains along the critical path). In other words, this scheduling would probably only save a few 100 ms for most projects.
</li>Given this understanding, one could be tempted to look at a few other scenarios: <li>Information
- Information could be profiled from previous runs that could be used to affect priorities in subsequent
- Before reaching the hard-floor, it's all about prioritizing the resources to get there. For all but the first
- module there's usually a lot of other runnable tasks all along this
- After reaching the "hard floor", the issue is mostly about prioritizing available resources to focus on
reaching the packaging phase with all runtime-leaf-modules as quickly as possible ( all for one, one for all).
So it'd be possible to consider cross-module prioritization of threads/scheduling of tasks</li>tasks<h1>Number
Number of schedulable
<div>Figure Figure 5: Number of schedulable tasks</div>
Figure 6: first module in reactor dependency is critical path of execution</div>
In this scenenario, the unit tests in the first module take a long time to complete.