Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
17.2. MapReduceTask Distributed Execution
Distributed Execution of the MapReduceTask occurs in three phases:
- Mapping phase.
- Outgoing Key and Outgoing Value Migration.
- Reduce phase.
The Mapping phase occurs as follows:
Procedure 17.1. Mapping Phase
- The input keys are grouped according to their owner nodes.
- On each node the
Mapperfunction processes all key/value pairs local to that node. - The results of the mapping process are collected using a
Collector. - If a
Reduceris specified, it is applied to all intermediate values collected for each outgoing key (KOut,VOut).
The Outgoing Key and Outgoing Value migration phase occurs as follows:
Procedure 17.2. Outgoing Key and Outgoing Value Migration Phase
- Intermediate keys exposed by the
Mapperare grouped by the intermediate outgoing key (KOutvalues. This grouping preserves the keys, as the mapping phase, when applied to other nodes in the cluster, may generate identical intermediate keys. - Once the Reduce phase has been invoked, (as described in Step 4 of the Mapping Phase above), an underlying hashing mechanism, a temporary distributed cache, and a DeltaAware cache insertion mechanism are used to:
- hash the intermediate key
KOutusing its owner node. - migrate the hashed
KOutkey and its correspondingVOutvalue to the same owner node.
- The list of
KOutkeys are returned to the master node.VOutvalues are not returned as they are not required by the master node.
The Reduce phase occurs as follows:
Procedure 17.3. Reduce Phase
KOutkeys are grouped according to owner node.- The reduce operation applies to each node and its input (grouped
KOutkeys) as follows:- The reduce operation locates the temporary distributed cache created during the migration phase on the target node.
- For each
KOutkey, a list ofVOutvalues is taken from the temporary cache. - The
KOutandVOutvalues are wrapped in anIteratorand the reduce operation is applied to the result.
- The reduce operation generates a map where each key is
KOutand each value isVOut.Each node has its own map. - The maps from each node in the cluster are combined into a single map (
M). - The MapReduce task returns map (
M) to the node that initiated theMapReducetask.