Skip to end of metadata
Go to start of metadata

The need for performance metrics and comparison

Since we released AspectWerkz 1.0, and more generally for every release of any AOP / interceptor framework (AspectWerkz, AspectJ, JBoss AOP, Spring AOP, cglib, dynaop etc), a question is always raised: "what is the performance cost of such an approach?", "how much do I loose per method invocation when an advice / interceptor is applied?".

This is indeed an issue that needs to be carefully addressed, and that in fact has affected the design of every mature enough framework.

We are probably all scared by the cost of the java.lang.reflect despite its relative power, and usually, even before starting to evaluate semantics robustness and ease of use in general - we start doing some Hello World bench.

We have started AWbench for that purpose. Offering a single place to measure the relative performance of AOP/Interceptor frameworks, and measure it by your own.

More than providing performance comparison, AWbench is a good place to figure out the semantic differences and ease of use of each framework by using them for the same rather simple purpose. A "line of count" metrics will be provided in a next report.

Current performance results

This table provides the figures from a bench in "nanosecond per advised method invocation". A single method invocation is roughly about 5 ns/iteration on the bench hardware/software that was used. Note that an advised application will have more behavior than just a non advised method so you should not compare non advised version to advised version. AWbench does not provide yet metrics for a hand written implementation of the AOP concepts.

The results were obtained with 2 million iterations.

In this table, the two first lines in bold are the most important ones. In a real world application, it is likely that the before or around advice will interact with the code it is advising and to be able to do that it needs to access runtime information (contextual information) like method parameters values and target instance. It is also likely that the join point is advised by more than one advice.

On the opposite it is very unlikely to have just a before advice that does nothing, but it gives us a good evaluation on the most minimal overhead we can expect.

Note: comparing such results when the difference is small (f.e. 15 ns vs 10 ns) might not be relevant. Before doing so you should run the bench several time and compute an average after removing the smallest and highest measurements.

AWBench (ns/invocation)

aspectwerkz

awproxy

aspectwerkz_1_0

aspectj

jboss

spring

dynaop

cglib

ext:aopalliance

ext:spring

ext:aspectj

before, args() target()

10

25

606

10

220

355

390

145

-

220

-

around x 2, args() target()

80

85

651

50

290

436

455

155

465

476

-

before

15

20

520

15

145

275

320

70

-

40

10

before, static info access

30

30

501

25

175

275

330

70

-

35

-

before, rtti info access

50

55

535

50

175

275

335

75

-

35

-

after returning

10

20

541

10

135

285

315

85

-

45

15

after throwing

3540

3870

6103

3009

5032

-

6709

8127

-

-

3460

before + after

20

30

511

20

160

445

345

80

-

35

20

before, args() primitives

10

20

555

10

195

350

375

145

-

210

-

before, args() objects

5

25

546

10

185

325

345

115

-

200

-

around

60

95

470

10

-

225

315

75

-

-

90

around, rtti info access

70

70

520

50

140

250

340

80

70

70

-

around, static info access

80

90

486

25

135

245

330

75

80

80

-


This table provides the figures from the same bench where for each category AspectWerkz 2.0.RC2-snapshot is the reference.
The first line illustrates that for the most simple before advice, AspectWerkz is 13 times faster than JBoss AOP 1.0.

AWBench (relative %)

aspectwerkz

awproxy

aspectwerkz_1_0

aspectj

jboss

spring

dynaop

cglib

ext:aopalliance

ext:spring

ext:aspectj

before, args() target()

1 x

2.5 x

60.6 x

1 x

22 x

35.5 x

39 x

14.5 x

-

22 x

-

around x 2, args() target()

1 x

1 x

8.1 x

0.6 x

3.6 x

5.4 x

5.6 x

1.9 x

5.8 x

5.9 x

-

before

1 x

1.3 x

34.6 x

1 x

9.6 x

18.3 x

21.3 x

4.6 x

-

2.6 x

0.6 x

before, static info access

1 x

1 x

16.7 x

0.8 x

5.8 x

9.1 x

11 x

2.3 x

-

1.1 x

-

before, rtti info access

1 x

1.1 x

10.7 x

1 x

3.5 x

5.5 x

6.7 x

1.5 x

-

0.7 x

-

after returning

1 x

2 x

54.1 x

1 x

13.5 x

28.5 x

31.5 x

8.5 x

-

4.5 x

1.5 x

after throwing

1 x

1 x

1.7 x

0.8 x

1.4 x

-

1.8 x

2.2 x

-

-

0.9 x

before + after

1 x

1.5 x

25.5 x

1 x

8 x

22.2 x

17.2 x

4 x

-

1.7 x

1 x

before, args() primitives

1 x

2 x

55.5 x

1 x

19.5 x

35 x

37.5 x

14.5 x

-

21 x

-

before, args() objects

1 x

5 x

109.2 x

2 x

37 x

65 x

69 x

23 x

-

40 x

-

around

1 x

1.5 x

7.8 x

0.1 x

-

3.7 x

5.2 x

1.2 x

-

-

1.5 x

around, rtti info access

1 x

1 x

7.4 x

0.7 x

2 x

3.5 x

4.8 x

1.1 x

1 x

1 x

-

around, static info access

1 x

1.1 x

6 x

0.3 x

1.6 x

3 x

4.1 x

0.9 x

1 x

1 x

-

Bench were run on a Java HotSpot 1.4.2, Windows 2000 SP4, Pentium M 1.6 GHz, 1 Go RAM.

Notes:

  • Some figures are not available when the underlying framework does not allow the feature. For the ext: ones, that can be due to pending work (AOP alliance interfaces can emulate a before advice just as it is the case in JBoss AOP).
  • after throwing advice appears to be slow since it first, is an overhead in throwing the exception (user code) and second, in catching the exception and do an instanceof to check the exception type (advice code).
  • latest run: Dec 20, 2004, as per Spring Framework team feedback.

AWbench internals

Summary

AWbench is a micro benchmark suite, which aims at staying simple. The test application is very simple, and AWbench is mainly the glue around the test application that applies one or more very simple advice / interceptor of the framework of your choice.

AWbench comes with an Ant script that allows you to run it on you own box, and provide some improvement if you know some for a particular framework.

What is the scope for the benchmark?

So far, AWbench includes method execution pointcuts, since call side pointcuts are not supported by proxy based framework (Spring AOP, cglib, dynaop etc).

The awbench.method.Execution class is the test application, and contains one method per construct to bench. An important fact is that bytecode based AOP may provide much better performance for before advice and after advice, as well as much better performance when it comes to accessing contextual information.
Indeed, proxy based frameworks are very likely to use reflection to give the user access to intercepted method parameters at runtime from within an advice, while bytecode based AOP may use more advanced constructs to provide access at the speed of a statically compiled access.

The current scope is thus:

For method execution pointcut

Construct

Contextual information access

Notes

 

 

 

before advice

none

 

before advice

static information (method signature etc)

 

before advice

contextual information accessed reflectively

Likely to use of casting and unboxing of primitives

before advice

contextual information accessed with explicit framework capabilities

Only supported by AspectJ and AspectWerkz 2.x

 

 

 

after advice

none

 

after returning advice

return value

 

after throwing advice

exception instance

 

 

 

 

before + after advice

none

 

 

 

 

around advice

optimized

AspectJ and AspetWerkz 2.x provides specific optimizations (thisJoinPointStaticPart vs thisJoinPoint)

around advice

non optimizezd

 

 

 

 

2 around advice

contextual information

 

By accessing contextual information we means:

  • accessing a method parameter using its real type (i.e. boxing and unboxing might be needed)
  • accessing a the advised instance using its real type (i.e. casting might be needed)

A pseudo code block is thus likely to be:

Which AOP and Proxy frameworks are benched?

The following are included in AWbench:

Bytecode based frameworks

Proxy based frameworks

Framework

URL

Spring AOP (1.1.1)

http://www.springframework.org/

cglib proxy (2.0.2)

http://cglib.sourceforge.net/

dynaop (1.0 beta)

https://dynaop.dev.java.net/

Moreover, AWbench includes AspectWerkz Extensible Aspect Container that allow to run any Aspect / Interceptor framework within the AspectWerkz 2.x runtime:

AspectWerkz Extensible Aspect Container running

Notes

AspectJ

 

AOP Alliance

http://aopalliance.sourceforge.net/

Spring AOP

 

AWbench is extensible. Refer to the How to contribute? section (below) for more info on how to add your framework to the bench.

What's next ?

Running awbench by your own

AWBench is released under LGPL.
There will never be a distribution of it, but source can be checked out:

Once checked out, you can run the bench using several different Ant target

How to contribute?

If you notice some optimizations for one of the implementation by respecting the requirements, we will add the fix in awbench and update the results accordingly.

If you are willing to write a non-AOP, non-Proxy based version of this bench so that a comparison between AOP approach and regular OO design patterns is possible send us an email.

Limitations

The current implementation is not covering fine grained deployment models like perInstance / perTarget, whose underlying implementation are unlikely to be neutral on performance results.

  • No labels

11 Comments

  1. Jonas, interesting stuff. I've been compiling my own benchmarks a while ago and didn't get the exact same results, but I'll look into that later.

    Although I don't think the small invocation overhead in Spring applications is relevant (oftentimes, Spring AOP is used to do transaction management, security, etctera--all resource intensive stuff, generating much more overhead than the Spring AOP stuff), you can make it about 10 to 15% faster when including the following as properties for the PFB:

    <property name="optimize"><value>true</value></property>
    <property name="opaque"><value>true</value></property>

    Optization differs per proxy type with Spring. When using CGLib for example, you won't be able to change the advice configuration from an already created proxy. In 1.3, the option to freeze the configuration will also be added, giving an even bigger performance increase. Setting the proxy to opaque will disable the feature that allows you to cast the proxy to Advised (inspecting the proxy's advisors, etcetera). The latter won't increase performance a lot, the former will however!

    Also, what about the different deployment models available with JBoss for example. I assume you've been precompiling the aspects for JBoss (performance will degrade with a factor 2 if you do online weaving, at least, using my benchmark (smile) ).

  2. Where I said: 1.3, you have to read: 1.1.3...

  3. Just to add to what Alef has written, the default configuration in Spring is to allow for advice to added/removed on the fly and to use JDK proxies rather than CGLIB. In situations where you don't want to change the advice chain then you can 'freeze' the proxy once it is created. When using this in conjunction with the CGLIB proxies in Spring you get quite a bit of a performance enhancement. Once I have downloaded the code, I will submit any optimiziations for Spring if I can make them.

  4. Anonymous

    What about cflow and cflowbelow? For my money, they are the most interesting pointcuts that I deal with.

  5. Anonymous

    It will also be interesting to see for those that support call-based pointcuts and dynamic loading how the startup is affected.

  6. Anonymous

    It doesn't appear that the benchmark when checked out and run generates the first two benchmark lines. The first reported is:

    run:ext:spring:
    java |-------------------------------------------------------------------------------
    java | Nanosecond (E-9) / iteration Label
    java |--------------------------------------------------------------------------------
    java | 157 2000000 method execution, before advice (measured in 2000000 iterations)

    For each benchmark.

  7. The table here is not the output order of the "ant run:all". It was more interesting to have more realistic schemes like "before with context exposure" and "2 around advice with context exposure" appear first and in
    bold.
    The "157" you obtain for "ext:spring" (Spring aspects within the AspectWerkz runtime) with the label "before advice" match the "before" line in the table for "ext:spring" which is "40" (ns/iteration) in the table due to hardware differences / VM differences etc.
    I attach to the page the full log of the bench that lead to this table (output of "ant run:all")

  8. Anonymous

    There exists another benchmark suite of AOP (AspectJ) programs:

    http://www.sable.mcgill.ca/benchmarks/aspectj

    These benchmarks are described in detail in this year's OOPSLA paper:

    http://aspectbench.org/papers#oopsla2004

    Those experiments prompted the construction of an extensible, optimising compiler for AspectJ:

    http://aspectbench.org

    cflow and cflowbelow are indeed important, and need special optimisations, see

    http://aspectbench.org/techreports#abc-2004-3

    some of these optimisations have been adopted in ajc 1.2.1.

  9. Jonas,
    1) What test application do you use in your benchmark?
    2) Where can I find AWbench download?

    Thanks and best regards,
    Michael

  10. The test application is only the micro sample app we wrote for the bench.
    You check it out from the CVS and build it yourself, there is no dist to download.

    See bottom of article on how how checkout the sources.

    There you can see yourself how the bench works etc.

    /jonas

  11. Anonymous

    We have been using AspectWerkz and JBossAOP and found very slow performance in both implementations. JBoss was executing 2000/sec and AWerkz up to 19,000/sec. We have finally go to the bottom of it and found that its the scope definition slowing everything down. When the scope set is "perJVM" performance is around 2,000,000/sec and when "perInstance" we get 19,000/sec. This is a fairly significant difference and our experience with CGLib or proxy based implementation have show better performance than those implementations. Do you have any suggestions on how to improve performance?
    Regards neil.