16.2.2. MapReduceTask
In JBoss Data Grid,
MapReduceTask
is a distributed task, which unifies the Mapper
, Combiner
, Reducer
, and Collator
components into a cohesive computation, which can be parallelized and executed across a large-scale cluster.
These components can be specified with a fluent API. However,as most of them are serialized and executed on other nodes, using inner classes is not recommended.
For example:
new MapReduceTask(cache).mappedWith(new MyMapper()).combinedWith(new MyCombiner().reducedWith(new MyReducer()).execute(new MyCollator()).
MapReduceTask
requires a cache containing data that will be used as input for the task. The JBoss Data Grid execution environment will instantiate and migrate instances of provided Mappers
and Reducers
seamlessly across the nodes.
By default, all available key/value pairs of a specified cache will be used as input data for the task. This can be modified by using the
onKeys
method as an input key filter.
There are two
MapReduceTask
constructor parameters that determine how the intermediate values are processed:
distributedReducePhase
- When set to"false"
, the default setting, the reducers are only executed on the master node. If set to"true"
, the reducers are executed on every node in the cluster.useIntermediateSharedCache
- Only important ifdistributedReducePhase
is set to"true"
. If"true"
, which is the default setting, this task will share intermediate value cache with other executing MapReduceTasks on the grid. If set to"false"
, this task will use its own dedicated cache for intermediate values.