The need for performance metrics and comparison
Since we released AspectWerkz 1.0, and more generally for every release of any AOP / interceptor framework (AspectWerkz, AspectJ, JBoss AOP, Spring AOP, cglib, dynaop etc), a question is always raised: "what is the performance cost of such an approach?", "how much do I loose per method invocation when an advice / interceptor is applied?".
This is indeed an issue that needs to be carefully addressed, and that in fact has affected the design of every mature enough framework.
We are probably all scared by the cost of the java.lang.reflect despite its relative power, and usually, even before starting to evaluate semantics robustness and ease of use in general - we start doing some Hello World bench.
We have started AWbench for that purpose. Offering a single place to measure the relative performance of AOP/Interceptor frameworks, and measure it by your own.
More than providing performance comparison, AWbench is a good place to figure out the semantic differences and ease of use of each framework by using them for the same rather simple purpose. A "line of count" metrics will be provided in a next report.
Current performance results
This table provides the figures from a bench in "nanosecond per advised method invocation". A single method invocation is roughly about 5 ns/iteration on the bench hardware/software that was used. Note that an advised application will have more behavior than just a non advised method so you should not compare non advised version to advised version. AWbench does not provide yet metrics for a hand written implementation of the AOP concepts.
The results were obtained with 2 million iterations.
In this table, the two first lines in bold are the most important ones. In a real world application, it is likely that the before or around advice will interact with the code it is advising and to be able to do that it needs to access runtime information (contextual information) like method parameters values and target instance. It is also likely that the join point is advised by more than one advice.
On the opposite it is very unlikely to have just a before advice that does nothing, but it gives us a good evaluation on the most minimal overhead we can expect.
Note: comparing such results when the difference is small (f.e. 15 ns vs 10 ns) might not be relevant. Before doing so you should run the bench several time and compute an average after removing the smallest and highest measurements.
AWBench (ns/invocation) |
aspectwerkz |
awproxy |
aspectwerkz_1_0 |
aspectj |
jboss |
spring |
dynaop |
cglib |
ext:aopalliance |
ext:spring |
ext:aspectj |
|---|---|---|---|---|---|---|---|---|---|---|---|
before, args() target() |
10 |
25 |
606 |
10 |
220 |
355 |
390 |
145 |
- |
220 |
- |
around x 2, args() target() |
80 |
85 |
651 |
50 |
290 |
436 |
455 |
155 |
465 |
476 |
- |
before |
15 |
20 |
520 |
15 |
145 |
275 |
320 |
70 |
- |
40 |
10 |
before, static info access |
30 |
30 |
501 |
25 |
175 |
275 |
330 |
70 |
- |
35 |
- |
before, rtti info access |
50 |
55 |
535 |
50 |
175 |
275 |
335 |
75 |
- |
35 |
- |
after returning |
10 |
20 |
541 |
10 |
135 |
285 |
315 |
85 |
- |
45 |
15 |
after throwing |
3540 |
3870 |
6103 |
3009 |
5032 |
- |
6709 |
8127 |
- |
- |
3460 |
before + after |
20 |
30 |
511 |
20 |
160 |
445 |
345 |
80 |
- |
35 |
20 |
before, args() primitives |
10 |
20 |
555 |
10 |
195 |
350 |
375 |
145 |
- |
210 |
- |
before, args() objects |
5 |
25 |
546 |
10 |
185 |
325 |
345 |
115 |
- |
200 |
- |
around |
60 |
95 |
470 |
10 |
- |
225 |
315 |
75 |
- |
- |
90 |
around, rtti info access |
70 |
70 |
520 |
50 |
140 |
250 |
340 |
80 |
70 |
70 |
- |
around, static info access |
80 |
90 |
486 |
25 |
135 |
245 |
330 |
75 |
80 |
80 |
- |
This table provides the figures from the same bench where for each category AspectWerkz 2.0.RC2-snapshot is the reference.
The first line illustrates that for the most simple before advice, AspectWerkz is 13 times faster than JBoss AOP 1.0.
AWBench (relative %) |
aspectwerkz |
awproxy |
aspectwerkz_1_0 |
aspectj |
jboss |
spring |
dynaop |
cglib |
ext:aopalliance |
ext:spring |
ext:aspectj |
|---|---|---|---|---|---|---|---|---|---|---|---|
before, args() target() |
1 x |
2.5 x |
60.6 x |
1 x |
22 x |
35.5 x |
39 x |
14.5 x |
- |
22 x |
- |
around x 2, args() target() |
1 x |
1 x |
8.1 x |
0.6 x |
3.6 x |
5.4 x |
5.6 x |
1.9 x |
5.8 x |
5.9 x |
- |
before |
1 x |
1.3 x |
34.6 x |
1 x |
9.6 x |
18.3 x |
21.3 x |
4.6 x |
- |
2.6 x |
0.6 x |
before, static info access |
1 x |
1 x |
16.7 x |
0.8 x |
5.8 x |
9.1 x |
11 x |
2.3 x |
- |
1.1 x |
- |
before, rtti info access |
1 x |
1.1 x |
10.7 x |
1 x |
3.5 x |
5.5 x |
6.7 x |
1.5 x |
- |
0.7 x |
- |
after returning |
1 x |
2 x |
54.1 x |
1 x |
13.5 x |
28.5 x |
31.5 x |
8.5 x |
- |
4.5 x |
1.5 x |
after throwing |
1 x |
1 x |
1.7 x |
0.8 x |
1.4 x |
- |
1.8 x |
2.2 x |
- |
- |
0.9 x |
before + after |
1 x |
1.5 x |
25.5 x |
1 x |
8 x |
22.2 x |
17.2 x |
4 x |
- |
1.7 x |
1 x |
before, args() primitives |
1 x |
2 x |
55.5 x |
1 x |
19.5 x |
35 x |
37.5 x |
14.5 x |
- |
21 x |
- |
before, args() objects |
1 x |
5 x |
109.2 x |
2 x |
37 x |
65 x |
69 x |
23 x |
- |
40 x |
- |
around |
1 x |
1.5 x |
7.8 x |
0.1 x |
- |
3.7 x |
5.2 x |
1.2 x |
- |
- |
1.5 x |
around, rtti info access |
1 x |
1 x |
7.4 x |
0.7 x |
2 x |
3.5 x |
4.8 x |
1.1 x |
1 x |
1 x |
- |
around, static info access |
1 x |
1.1 x |
6 x |
0.3 x |
1.6 x |
3 x |
4.1 x |
0.9 x |
1 x |
1 x |
- |
Bench were run on a Java HotSpot 1.4.2, Windows 2000 SP4, Pentium M 1.6 GHz, 1 Go RAM.
Notes:
- Some figures are not available when the underlying framework does not allow the feature. For the ext: ones, that can be due to pending work (AOP alliance interfaces can emulate a before advice just as it is the case in JBoss AOP).
- after throwing advice appears to be slow since it first, is an overhead in throwing the exception (user code) and second, in catching the exception and do an instanceof to check the exception type (advice code).
- latest run: Dec 20, 2004, as per Spring Framework team feedback.
AWbench internals
Summary
AWbench is a micro benchmark suite, which aims at staying simple. The test application is very simple, and AWbench is mainly the glue around the test application that applies one or more very simple advice / interceptor of the framework of your choice.
AWbench comes with an Ant script that allows you to run it on you own box, and provide some improvement if you know some for a particular framework.
What is the scope for the benchmark?
So far, AWbench includes method execution pointcuts, since call side pointcuts are not supported by proxy based framework (Spring AOP, cglib, dynaop etc).
The awbench.method.Execution class is the test application, and contains one method per construct to bench. An important fact is that bytecode based AOP may provide much better performance for before advice and after advice, as well as much better performance when it comes to accessing contextual information.
Indeed, proxy based frameworks are very likely to use reflection to give the user access to intercepted method parameters at runtime from within an advice, while bytecode based AOP may use more advanced constructs to provide access at the speed of a statically compiled access.
The current scope is thus:
For method execution pointcut
Construct |
Contextual information access |
Notes |
|---|---|---|
|
|
|
before advice |
none |
|
before advice |
static information (method signature etc) |
|
before advice |
contextual information accessed reflectively |
Likely to use of casting and unboxing of primitives |
before advice |
contextual information accessed with explicit framework capabilities |
Only supported by AspectJ and AspectWerkz 2.x |
|
|
|
after advice |
none |
|
after returning advice |
return value |
|
after throwing advice |
exception instance |
|
|
|
|
before + after advice |
none |
|
|
|
|
around advice |
optimized |
AspectJ and AspetWerkz 2.x provides specific optimizations (thisJoinPointStaticPart vs thisJoinPoint) |
around advice |
non optimizezd |
|
|
|
|
2 around advice |
contextual information |
|
By accessing contextual information we means:
- accessing a method parameter using its real type (i.e. boxing and unboxing might be needed)
- accessing a the advised instance using its real type (i.e. casting might be needed)
A pseudo code block is thus likely to be:
Which AOP and Proxy frameworks are benched?
The following are included in AWbench:
Bytecode based frameworks
Framework |
URL |
|---|---|
AspectWerkz 1.0 |
http://aspectwerkz.codehaus.org |
AspectWerkz 2.x |
http://aspectwerkz.codehaus.org |
AspectJ (1.2) |
http://eclipse.org/aspectj/ |
JBoss AOP (1.0) |
http://www.jboss.org/developers/projects/jboss/aop |
Proxy based frameworks
Framework |
URL |
|---|---|
Spring AOP (1.1.1) |
http://www.springframework.org/ |
cglib proxy (2.0.2) |
http://cglib.sourceforge.net/ |
dynaop (1.0 beta) |
https://dynaop.dev.java.net/ |
Moreover, AWbench includes AspectWerkz Extensible Aspect Container that allow to run any Aspect / Interceptor framework within the AspectWerkz 2.x runtime:
AspectWerkz Extensible Aspect Container running |
Notes |
|---|---|
AspectJ |
|
AOP Alliance |
http://aopalliance.sourceforge.net/ |
Spring AOP |
|
AWbench is extensible. Refer to the How to contribute? section (below) for more info on how to add your framework to the bench.
What's next ?
Running awbench by your own
AWBench is released under LGPL.
There will never be a distribution of it, but source can be checked out:
Once checked out, you can run the bench using several different Ant target
How to contribute?
If you notice some optimizations for one of the implementation by respecting the requirements, we will add the fix in awbench and update the results accordingly.
If you are willing to write a non-AOP, non-Proxy based version of this bench so that a comparison between AOP approach and regular OO design patterns is possible send us an email.
Limitations
The current implementation is not covering fine grained deployment models like perInstance / perTarget, whose underlying implementation are unlikely to be neutral on performance results.

11 Comments
Hide/Show CommentsNov 29, 2004
Alef Arendsen
Jonas, interesting stuff. I've been compiling my own benchmarks a while ago and didn't get the exact same results, but I'll look into that later.
Although I don't think the small invocation overhead in Spring applications is relevant (oftentimes, Spring AOP is used to do transaction management, security, etctera--all resource intensive stuff, generating much more overhead than the Spring AOP stuff), you can make it about 10 to 15% faster when including the following as properties for the PFB:
<property name="optimize"><value>true</value></property>
<property name="opaque"><value>true</value></property>
Optization differs per proxy type with Spring. When using CGLib for example, you won't be able to change the advice configuration from an already created proxy. In 1.3, the option to freeze the configuration will also be added, giving an even bigger performance increase. Setting the proxy to opaque will disable the feature that allows you to cast the proxy to Advised (inspecting the proxy's advisors, etcetera). The latter won't increase performance a lot, the former will however!
Also, what about the different deployment models available with JBoss for example. I assume you've been precompiling the aspects for JBoss (performance will degrade with a factor 2 if you do online weaving, at least, using my benchmark
).
Nov 29, 2004
Alef Arendsen
Where I said: 1.3, you have to read: 1.1.3...
Nov 29, 2004
Rob Harrop
Just to add to what Alef has written, the default configuration in Spring is to allow for advice to added/removed on the fly and to use JDK proxies rather than CGLIB. In situations where you don't want to change the advice chain then you can 'freeze' the proxy once it is created. When using this in conjunction with the CGLIB proxies in Spring you get quite a bit of a performance enhancement. Once I have downloaded the code, I will submit any optimiziations for Spring if I can make them.
Nov 29, 2004
Anonymous
What about cflow and cflowbelow? For my money, they are the most interesting pointcuts that I deal with.
Nov 29, 2004
Anonymous
It will also be interesting to see for those that support call-based pointcuts and dynamic loading how the startup is affected.
Nov 29, 2004
Anonymous
It doesn't appear that the benchmark when checked out and run generates the first two benchmark lines. The first reported is:
run:ext:spring:
java |-------------------------------------------------------------------------------
java | Nanosecond (E-9) / iteration Label
java |--------------------------------------------------------------------------------
java | 157 2000000 method execution, before advice (measured in 2000000 iterations)
For each benchmark.
Nov 30, 2004
Alexandre Vasseur
The table here is not the output order of the "ant run:all". It was more interesting to have more realistic schemes like "before with context exposure" and "2 around advice with context exposure" appear first and in
bold.
The "157" you obtain for "ext:spring" (Spring aspects within the AspectWerkz runtime) with the label "before advice" match the "before" line in the table for "ext:spring" which is "40" (ns/iteration) in the table due to hardware differences / VM differences etc.
I attach to the page the full log of the bench that lead to this table (output of "ant run:all")
Nov 30, 2004
Anonymous
There exists another benchmark suite of AOP (AspectJ) programs:
http://www.sable.mcgill.ca/benchmarks/aspectj
These benchmarks are described in detail in this year's OOPSLA paper:
http://aspectbench.org/papers#oopsla2004
Those experiments prompted the construction of an extensible, optimising compiler for AspectJ:
http://aspectbench.org
cflow and cflowbelow are indeed important, and need special optimisations, see
http://aspectbench.org/techreports#abc-2004-3
some of these optimisations have been adopted in ajc 1.2.1.
Dec 06, 2004
Michael Furman
Jonas,
1) What test application do you use in your benchmark?
2) Where can I find AWbench download?
Thanks and best regards,
Michael
Dec 07, 2004
jboner jboner
The test application is only the micro sample app we wrote for the bench.
You check it out from the CVS and build it yourself, there is no dist to download.
See bottom of article on how how checkout the sources.
There you can see yourself how the bench works etc.
/jonas
Jan 05, 2005
Anonymous
We have been using AspectWerkz and JBossAOP and found very slow performance in both implementations. JBoss was executing 2000/sec and AWerkz up to 19,000/sec. We have finally go to the bottom of it and found that its the scope definition slowing everything down. When the scope set is "perJVM" performance is around 2,000,000/sec and when "perInstance" we get 19,000/sec. This is a fairly significant difference and our experience with CGLib or proxy based implementation have show better performance than those implementations. Do you have any suggestions on how to improve performance?
Regards neil.