Stress Testing Cometd (Jetty implementation)
These instructions show you how to stress test cometd from jetty 6.1.9 running on unix. The same basic steps apply to running on windows or mac and I'd be happy to add details instructions if somebody wants to contribute them.
The basic steps are:
- configure/tune operating system of test client and server machines
- install, configure and run jetty server
- run jetty bayeux test client
- interpret the results.
Configure/Tune operating system.
The main change needed to the operating system is that it needs to be able to support the number of connections (== file descriptors) for the test on both the server machine and the test client machines needed.
For a linux system, the file descriptor limit is change in the /etc/security/limit.conf file.
Add the following two lines (or change any existing
There are many other things that can be tuned in the server stack, and the zeus ZXTM documentation gives a good overview.
Install, configure and run Jetty
For the purposes of cometd testing, the standard configuration of jetty (
etc/jetty.xml needs to be edited to change the connector configuration for:
- increase the max idle time
- increase the low resources connections.
The relevant updated section is:
Jetty comes with cometd installed in
To run the server with additional memory needed for the test, use:
You should now be able to point a browser at the server at either:
Specifically try out the cometd chat room with your browser to confirm that it is working
Run jetty bayeux test client
The jetty cometd bayeux test client generates load simulating users in a chat room. To run the client:
The client has a basic text UI that operates in two phases: 1) global configuration 2) test runs.
An example global configuration phase looks like:
The Enter key can be used to accept the default value, or a new value typed and then press Enter. The parameters are their meaning are:
- server - The host name or IP address of the server running Jetty with cometd
- 8080 - The port (8080 unless you have changed it in jetty.xml)
- context - The context of the web application running cometd (cometd in the test server).
- base - The base bayeux channel name used for chat room. Normally you would not change this.
- rooms - The number of chat rooms to create. This will combine with the number of users to determine the users per room. If you have 100 rooms and 1000 users, then you will have 10 users per room and every message sent will be delivered 10 times. For runs with >10k users, 1000 rooms is a reasonable value.
- rooms per client This allows a simulated user to subscribe to multiple rooms. However, as these are randomly selected, values greater that 1 will mean that the client will not be able to accurately predict the number of messages that will be delivered. Leave this at 1 unless you are testing something specific.
- max Latency If the latency for delivering a message is greater than this value (in ms), abort the test.
After the global configuration, the test client loops through individual tests cycles. Again Enter may be used to accept the default value. Two iterations of the test cycle are below:
The parameters that may be set are:
- clients - The number of clients to simulate. The clients are kept from one test iteration to the next, so if the number of clients changes on an incremental number of new clients are created or destroyed. (NB. currently reducing clients produces a noisy exception as the connection is retried. This can be ignored).
- publish - The number of chat messages to be published in this test. The number of messages received will be this number multiplied by the users per chat room (which is the number of clients divided by the global number of rooms).
- publish size - The size in bytes of the chat message to publish.
- pause - A period in ms to pause between batches of published messages.
- batch - The size of the batch of publish messages to send in a burst.
While the test is executing, a series of digits is output to show progress. The digits represent the current average latency in units of 100ms. So a 0 represent <100ms latency from the time the message was publish by the client to when it has been received on the client. 1 represents a latency >=100ms and <200ms etc.
At the end of the test cycle the summary is printed showing the total messages received, the message rate and the min/ave/max latency.
Interpreting the results.
Before producing numbers for interpretation, it is important to run a number of trials and to allow the system to "warm up". During the initial runs, the java JIT compiler will optimize the code and object pools will be populated with reusable objects. Thus the first runs at a give number of clients is often slower, and this can be seen in the test cycle shown above where the average latency initially blew out to over 200ms before it was reduced back to <100ms. The average and max latency for the second run were far superior to the first run.
It is also important to use long runs for producing results, so that:
- Any statistical effect of the ramp-up and ramp-down periods in each test are reduced.
- So that any resources (queues, memory, file descriptors, etc) that are being used in a non-sustainable way will have a chance to max out and cause errors, garbage collections or other adverse affects.
- Any occasional system hiccups caused by other system events are included in the results
Typically it is best to start with short low volume test cycles and to gradually reduce the pause or increase the batch to determine approximate maximum message rates. Then the test duration can be extended by increasing the number of messages published or the number of clients (which also increases the message rate as there will be more users per room).
A normal run should report no exceptions or timeouts. For a single server and single test client with 1 room per simulated client, then the expected number of messages should always be received. If the server is running clustered, then as this demo has no cluster support, the messages will be reduced by the a factor equal to the number of servers. Similarly if multiple clients are used, each test client will see messages published from the other test client, so the number of messages received will be in excess.
If you are testing a load balancer, then it is very important that there is affinity, as the bayeux client ID must be known on the worker node used and both connections from the same simulated node must arrive at the same worker node. However, the test does not use HTTP sessions, so any cookies used for affinity will need to be set by the balancer (the test client will handle set cookies).
If you are testing a load balancer, then you should start with a cluster of 1, so that you can verify that no messages are being lost. Then increase the cluster size and be content that you will not have exact message counts and must adjust by the number of nodes.