Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

27.3. Using the Hadoop Connector


InfinispanInputFormat and InfinispanOutputFormat

In Hadoop, the InputFormat interface indicates how a specific data source is partitioned, along with how to read data from each of the partitions, while the OutputFormat interface specifies how to write data.

There are two methods of importance defined in the InpoutFormat interface:
  • List<InputSplit> getSplits(JobContext context);
    Copy to Clipboard Toggle word wrap
  • RecordReader<K,V> createRecordReader(InputSplit split,TaskAttemptContext context);
    Copy to Clipboard Toggle word wrap
The getSplits method defines a data partitioner, returning one or more InputSplit instances that contain information regarding a certain section of the data. The InputSplit can then be used to obtain a RecordReader which will be used to iterate over the resulting dataset. These two operations allow for parallelization of data processing across multiple nodes, resulting in Hadoop's high throughput over large datasets.
In regards to JBoss Data Grid, partitions are generated based on segment ownership, meaning that each partition is a set of segments on a certain server. By default there will be as many partitions as servers in the cluster, and each partition will contain all segments associated with that specific server.
Running a Hadoop Map Reduce job on JBoss Data Grid

Example of configuring a Map Reduce job targeting a JBoss Data Grid cluster:

import org.infinispan.hadoop.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;

[...]
Configuration configuration = new Configuration();
configuration.set(InfinispanConfiguration.INPUT_REMOTE_CACHE_SERVER_LIST, "localhost:11222");
configuration.set(InfinispanConfiguration.INPUT_REMOTE_CACHE_NAME, "map-reduce-in");
configuration.set(InfinispanConfiguration.OUTPUT_REMOTE_CACHE_SERVER_LIST, "localhost:11222");
configuration.set(InfinispanConfiguration.OUTPUT_REMOTE_CACHE_NAME, "map-reduce-out");

Job job = Job.getInstance(configuration, "Infinispan Integration");
[...]
Copy to Clipboard Toggle word wrap

In order to target the JBoss Data Grid, the job needs to be configured with the InfinispanInputFormat and InfinispanOutputFormat classes:
[...]
// Define the Map and Reduce classes
job.setMapperClass(MapClass.class);
job.setReducerClass(ReduceClass.class);
  	
// Define the JBoss Data Grid implementations
job.setInputFormatClass(InfinispanInputFormat.class);
job.setOutputFormatClass(InfinispanOutputFormat.class);
[...]
Copy to Clipboard Toggle word wrap
Nach oben
Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2025 Red Hat