此内容没有您所选择的语言版本。

Chapter 16. MapReduce


16.1. About MapReduce

The JBoss Data Grid MapReduce model is an adaptation of Google's MapReduce model.
MapReduce is a programming model used to process and generate large data sets. It is typically used in distributed computing environments where nodes are clustered. In JBoss Data Grid, MapReduce allows transparent distributed processing of very large amounts of data across the data grid by performing most computations as locally possible to where the data is stored.
MapReduce uses the two distinct computational phases of map and reduce to process information requests through the data grid. The process occurs as follows:
  1. The user initiates a task on a cache instance, which runs on a cluster node (the master node).
  2. The master node receives the task input, divides the task, and sends tasks for map phase execution on the grid.
  3. Each node executes a Mapper function on its input, and returns intermediate results back to the master node.
    • If the distributedReducePhase parameter is set to "true", the map results are inserted in an intermediary cache, rather than being returned to the master node.
    • If a Combiner has been specified with task.combinedWith(Reducer), the Combiner is called on the Mapper results and the combiner's results are retured to the master node or inserted in the intermediary cache.
  4. The master node collects all intermediate results from the map phase and merges all intermediate values associated with the same intermediate key.
    • If the distributedReducePhase parameter is set to "true", the merging of the intermediate values is done on each node, as the Mapper or Combiner results are inserted in the intermediary cache.The master node only receives the intermediate keys.
  5. The master node sends intermediate key/value pairs for reduction on the grid.
    • If the distributedReducePhase parameter is set to "false", the reduction phase is executed only on the master node.
  6. The final results of the reduction phase are returned.
    • If the distributedReducePhase parameter is set to "true", the master node running the task receives all results from the reduction phase and returns the final result to the MapReduce task initiator.
    • If a Collator has been specified with task.execute(Collator), the Collator is executed on the reduction phase results, and the Collator result is returned to the task initiator.
返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat