Este contenido no está disponible en el idioma seleccionado.

16.3. MapReduceTask Distributed Execution


16.3.1. MapReduceTask Distributed Execution

Distributed Execution of the MapReduceTask occurs in three phases:
  • Mapping phase.
  • Outgoing Key and Outgoing Value Migration.
  • Reduce phase.
The Mapping phase occurs as follows:

Procedure 16.1. Mapping Phase

  1. The input keys are grouped according to their owner nodes.
  2. On each node the Mapper function processes all key/value pairs local to that node.
  3. The results of the mapping process are collected using a Collector.
  4. If a Reducer is specified, it is applied to all intermediate values collected for each outgoing key (KOut, VOut).
The Outgoing Key and Outgoing Value migration phase occurs as follows:

Procedure 16.2. Outgoing Key and Outgoing Value Migration Phase

  1. Intermediate keys exposed by the Mapper are grouped by the intermediate outgoing key (KOut values. This grouping preserves the keys, as the mapping phase, when applied to other nodes in the cluster, may generate identical intermediate keys.
  2. Once the Reduce phase has been invoked, (as described in Step 4 of the Mapping Phase above), an underlying hashing mechanism, a temporary distributed cache, and a DeltaAware cache insertion mechanism are used to:
    • hash the intermediate key KOut using its owner node.
    • migrate the hashed KOut key and its corresponding VOut value to the same owner node.
  3. The list of KOut keys are returned to the master node. VOut values are not returned as they are not required by the master node.
The Reduce phase occurs as follows:

Procedure 16.3. Reduce Phase

  1. KOut keys are grouped according to owner node.
  2. The reduce operation applies to each node and its input (grouped KOut keys) as follows:
    • The reduce operation locates the temporary distributed cache created during the migration phase on the target node.
    • For each KOut key, a list of VOut values is taken from the temporary cache.
    • The KOut and VOut values are wrapped in an Iterator and the reduce operation is applied to the result.
  3. The reduce operation generates a map where each key is KOut and each value is VOut.
    Each node has its own map.
  4. The maps from each node in the cluster are combined into a single map (M).
  5. The MapReduce task returns map (M) to the node that initiated the MapReduce task.
Volver arriba
Red Hat logoGithubredditYoutubeTwitter

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Ayudamos a los usuarios de Red Hat a innovar y alcanzar sus objetivos con nuestros productos y servicios con contenido en el que pueden confiar. Explore nuestras recientes actualizaciones.

Hacer que el código abierto sea más inclusivo

Red Hat se compromete a reemplazar el lenguaje problemático en nuestro código, documentación y propiedades web. Para más detalles, consulte el Blog de Red Hat.

Acerca de Red Hat

Ofrecemos soluciones reforzadas que facilitan a las empresas trabajar en plataformas y entornos, desde el centro de datos central hasta el perímetro de la red.

Theme

© 2025 Red Hat