Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The compiler-replay methodology is deterministic and eliminates memory allocation and mutator variations due to non-deterministic application of the adaptive compiler. We need this latter methodology because the non-determinism of the adaptive compilation system makes it a difficult platform for detailed performance studies. For example, we cannot determine if a variation is due to the system change being studied or just a different application of the adaptive compiler. The information we record and use are hot methods and blocks information. We also record dynamic call graph with calling frequency on each edge for inlining decisions.

Note that in December 2011, compiler replay was significantly improved.   The notes below apply to the post December 2011 version of replay.

Here is how to use it:

Generate

...

advice.

There are three kinds of advice used by the replay system, each is workload-specific (ie you should generate advice files for each benchmark):

  1. Compilation advice (.ca file).   This advice records for every compiled method which compiler (base or opt) and if opt, at which optimization level it should be compiled.  Replay compilation will not work without a compilation advice file.
  2. Edge counts (.ec file).  This advice captures edge counts generated by the execution of baseline-compiled code.   Edge counts are used by the compiler to understand which edges in the control flow graph are hot.   At the time of writing, edge counts were measured as contributing about 2% to the bottom line in terms of performance (average of DaCapo, jvm98 and jbb)
  3. Dynamic callgraph (.dc file).  This advice captures the dynamic call graph, which allows the compiler to understand the frequency with which particular call chains occur.  This is particularly useful in guiding inlining decisions.  At the time of writing the call graph contributes about 8% to the bottom line in terms of performance.

One way to gather advice is to execute the benchmark multiple times under controlled settings, producing profiles at each execution.   Then establish the fastest execution among the set of runs, and choose the profiles associated with that execution as the advice files.   A common methodology is to invoke each benchmark 20 times (ie take the best invocation from a set of 20 trials), and in each invocation, run 10 iterations of the benchmark (ie the advice will then capture the warmed-up, steady state of the benchmark). 

When generating the advice, you will need to use the following command line arguments (typically use all six arguments, so that all three advice files are generated at each invocation):

For adaptive compilation profile
Panel

-X:

...

aos:

...

enable_

...

advice_

...

generation=true
-X:

...

aos:cafo=my_

...

compiler_

...

advice_file.ca

For

...

edge count profile
Panel

-X:

...

base:

...

profile_

...

edge_

...

counters=true
-X:

...

base:profile_edge_counter_file=my_

...

edge_

...

counter_file.ec

For dynamic call graph profile

...

Panel

-X:aos:dcfo=my_dynamic_call_graph_file.dc
-X:aos:final_report_level=2

...

 

Executing with advice.

The basic model is simple.  At a nominated time in the execution of a program, all methods specified in the .ca advice file will be (re)compiled with the compiler and optimization level nominated in the advice file.  Broadly, there are two ways of initiating bulk compilation: a) by calling the method org.jikesrvm.adaptive.recompilation.BulkCompile.compileAllMethods() during execution, and b) by using the -X:aos:enable_precompile=true flag at the command line to trigger bulk compilation at boot time.  A standard methodology is to use a benchmark harness call back mechanism to call compileAllMethods() at the end of the first iteration of the benchmark.   At the time of writing this gave performance roughly 2% faster than the 10th iteration of regular adaptive compilation.  Because precompilation occurs early, the compiler has less information about the classes, and in consequence the performance of precompilation is about 9% slower than the 10th iteration of adaptive compilation.

For 'warmup' replay (where org.jikesrvm.adaptive.recompilation.BulkCompile.compileAllMethods() is called at the end of the first iteration):

Panel

-X:aos:initial_compiler=base -X:aos:enable_

...

...

...

benchmark.ec -X:aos:dcfi=benchmark.dc

For precompile replay (where bulk compilation occurs at boot time):

Panel

-X:aos:

...

...

...

Measuring GC performance

 MMTk includes a statistics subsystem and a harness mechanism for measuring its performance.  If you are using the DaCapo benchmarks, the MMTk harness can be invoked using the '-c MMTkCallback' command line option, but for other benchmarks you will need to invoke the harness by calling the static methods

...

Option

Description

-X:gc:printPhaseStats=true

Print statistics for each mutator/gc phase during the run

-X:gc:xmlStats=true

Print statistics in an XML format (as opposed to human-readable format)

-X:gc:verbose

This is incompatible with MMTk's statistics system.

-X:gc:variableSizeHeap=false

Disable dynamic resizing of the heap

Unless you are specifically researching flexible heap sizes, it is best to run benchmarks in a fixed size heap, using a range of heap sizes to produce a curve that reflects the space-time tradeoff.  Using replay compilation and measuring the second iteration of a benchmark is a good way to produce results with low noise.

...