Master and Slaves
We are starting out with a standard master-slave system, where slaves will run on each of the test client boxes. We will allow one master to begin with, and as many slaves as you want. To start out with, we will not have auto-discovery of the slaves, but I can imagine in the future we could leverage Jabber to get this in (and we should). This will mean that to start with, the Master will need to have a flatfile that describes where to find all the slaves.
Master
The master is the coordinater, it's role is to:
- Contact all the slaves and tell them it's time to roll.
- Send down the test configuration info, such as how many threads to run and which threads should be running which tests.
- Send down the necassary Jars to each of the slaves so they will be able to run the tests.
- Optionally, send down test-specific XML so the individual tests know how to configure themselves.
- At the end, the master will also collect all the performance results, and collate them into a single file (In the future, we would also like this info to be accessible via JMX).
For simplicity, I was going to start off having one directory on the master machine that contains all the jars needed to run any of the tests. In each instance all these jars will be sent to each of the slaves (with the simple optimization of not sending jars that are already current on the slaves). In the future, when we need to support more complicated scenarios and need to start supporting configuring of the system itself, we want to leverage Maven for solving the dependancy issues as well as for providing the mechanisms needed for starting and stopping the server.
Slave
The slaves are the ones that do all the work. Currently, their role will only be for running distributed tests, but in the future they will move into the role of starting/stoping/re-configuring the tested system to allow easier automation among the different test configurations.
In it's current role, the slave will:
- Listen for the master
- Take an XML payload and configure itself for running tests
- Download the jars needed to run that test
- Spin up the threads needed to run the tests, and create the containers necassary. To start off, there will only be one kind of container and it will run JUnit tests. Since, many JUnit tests are not written to be multi-threaded, each thread will probably have it's own container to prevent accidentally sharing static info. In the future, I would like to keep open the possibility to create different containers based on the tests, so people could add a container for TestNG, etc. Also, so tests that are multi-threading aware can share a container.
- The Slave will also have a performance event logging interface. Each container will be responsible for how to hand this interface to it's tests. For JUnit, we will probabaly look for an implemented interface call IPerformanceTest, if the test supports this interface, we will poke it with the logging interface.
- The slave will then be responsible for sending the collected data back to the master for collating. In the beginning, all this data will probabaly be written to a flatfile, and then sent back at the end of the run. In the future, you could imagine an option where some of this info was streamed to the master in blocks as the run goes on.
To do this, the slave will have a few different peices:
- The Daemon Thread. This thread will be listening for commands from the master. It will translate those commands into Stop/Start/Whatever.
- The Test Manager This will be the peice of code that looks at the configuration information to determine how many threads to create, and what tests they should run. It will also be responsible for looking at the tests to determine which type of container they should run in as well as whether each thread needs its own container or whether they can all live together.
- The Thread Manager This will live in the Test Manager and will be responsible for creating, starting and stopping threads (and therefore tests).
- The Test Containers This will be a container specific for the type of test we are running, whether that be JUnit, SysUnit or TestNG. It will be responsible for initializing the tests, hooking into the proper Test Infrastructure, and providing potentially new services for tests running in this environment (such as a performance logger or a synchronization mechanism).
- The Performance Logger This will be a logger to keep track of certain performance events, such as "recieved message with ID 12345 at time 678899" At the end of the run, we will be able to go through all the data to come up with a coherent picture of what happened.
Data Viewer
This will start as being a rather small and simple peice. This will be able to collate the data from the many different slaves, as well as determine the average response time for a message. A large portion of this will be built as a litbrary that will make it easy for other people to build better veiwers on top of this data.
Future Directions
The peices described above are what I'm breaking out as the V1 release of the infrastructure. This V1 release will focus on getting test clients going, and getting information to determine performance throughput for a system. In the V2 release, we will focus more on configuring and understanding the system itself. This means:
- Hook into Maven to allow users to start and stop servers in the system with different parameters
- Also use Maven to keep all the servers up to date on the jars and configuration info they need in order to run properly. This will make running these tests in a development environment much easier.
- Allow "Monitoring Slaves" that instead of (or in addition to) running tests also know how to collect system information on the system, whether this means hooking into OS specific system monitors or using JMX interafaces to get at statistics information. This info will then be reported in a standard format back to the master
- We will want to develop better tools for getting information our of the reported performance data
- We will want to be able to develop extensions to move this to allow Stress Testing and Failure Testing in addition to just performance testing. When moving onto Failure testing, verifying the tests can become more difficult. I'm not sure how much we can help with that part, but we should be able to help add failure points into the system, so that one part of the test case is to "crash JMS server under load". Then let the tests verify that everything was dealt with correctly.
