People love variety but frequently fail to make their choices. Since GPars brings several concurrency paradigms to the table, we decided to build a simplified guide helping users pick the right concept for their task.
A Short Guide
---> Geometric decomposition (Parallel collections)
---> Recursive (Fork/Join)
---> Independent tasks (Parallel collections, dataflow tasks)
---> Recursively dependent tasks (Fork/Join)
---> Tasks with mutual dependencies (Composable asynchronous functions, Dataflow tasks, CSP)
---> Tasks cooperating on the same data (Stm, agents)
Streamed data decomposition
---> Pipeline (Dataflow channels and operators)
---> Event-based (Actors, Active objects)
Whenever you come across a collection that takes a while to process, consider using parallel collection methods. Although enabling collections for parallel processing imposes overhead, it frequently outweights the ineffectiveness of sequential processing. GPars gives you two options here:
- GParsPool, which leverages the efficient Fork/Join algorithm using a Fork/Join thread pool
- GParsExecutorsPool, which builds on plain old Java 5 executors
The parallel collection methods, such as eachParallel, findAllParallel, etc., discussed above provide an easy migration path from sequential to concurrent code. However, when chaining multiple collection-processing methods it is more effective to use the map/reduce principle instead. Use GParsPool-based map/reduce operations to avoid the overhead of creating and destroying parallel collection for each parallel method in the chain. The conversion will be done only once - during the call to retrieve the parallel property of a collection. Since then the parallel tree-like data structure will be reused by all subsequent calls. The map/reduce approach should be preferred for chained parallel method calls.
Trees or hierarchies are naturally parallel data structures. Fork/Join algorithms process hierarchical data or problems concurrently. Use GParsPool and its Fork/Join convenience layer was designed to created Fork/Join calculations easily.
Long-lasting calculations can be run in the background with very little syntactic and performance overhead. Use asynchronous functions within either GParsPool or GParsExecutorsPool
- callAsync() to invoke a closure asynchronously
- asyncFun() to create an asynchronous closure out of the original one.
- Asynchronous closures can then be combined just like the original sequential ones
Use dataflow channels and operators to build networks of independent, asynchronous elements , event-driven calculations that process data. Dataflow networks are typically used for data or image processing, data mining or computer simulations.
Use DataflowVariables with the thread-safe single-write multiple-read semantics. They ensure the reader cannot continue before a value is safely written to the variable by another thread. Also, they allow for callbacks to be registered and invoked as soon as a value gets bound.
Splitting algorithms explicitly into independent
asynchronous objects with direct addressing
Use Actors in one of its many flavors. This is exactly their domain - independent active objects exchanging messages asynchronously.
Wrapping actors with a POJO facade
Active Objects offer OO interface to actors. You get POJOs, whose methods are asynchronous and whose state is protected under the same guarantees as with actors.
Splitting algorithms explicitly into independent concurrent processes
with indirect addressing
Use Dataflow tasks/processes communicating through dataflow channels or Groovy CSP. Unlike with actors Actors, you get deterministic behavior allowing for re-use and composability. Additionally, you may also combine asynchronous and synchronous communication channels to limit the number of unprocessed messages in the network. The ability to address parties indirectly through channels loosens the coupling between components of the algorithm and makes tasks such as load-balancing or broadcasting easier to implement.