14.3. Map Reduce Example
The following example uses a word count application to demonstrate MapReduce and its distributed task abilities.
This example assumes we have a mapping of the key sentence stored on JBoss Data Grid nodes.
- Key is a String.
- Each sentence is a String.
All words that appear in all sentences must be counted.
Example 14.6. Implementing the Distributed Task
public class WordCountExample { /** * In this example replace c1 and c2 with * real Cache references * * @param args */ public static void main(String[] args) { Cache c1 = null; Cache c2 = null; c1.put("1", "Hello world here I am"); c2.put("2", "Infinispan rules the world"); c1.put("3", "JUDCon is in Boston"); c2.put("4", "JBoss World is in Boston as well"); c1.put("12","WildFly"); c2.put("15", "Hello world"); c1.put("14", "Infinispan community"); c2.put("15", "Hello world"); c1.put("111", "Infinispan open source"); c2.put("112", "Boston is close to Toronto"); c1.put("113", "Toronto is a capital of Ontario"); c2.put("114", "JUDCon is cool"); c1.put("211", "JBoss World is awesome"); c2.put("212", "JBoss rules"); c1.put("213", "JBoss division of RedHat "); c2.put("214", "RedHat community"); MapReduceTask<String, String, String, Integer> t = new MapReduceTask<String, String, String, Integer>(c1); t.mappedWith(new WordCountMapper()) .reducedWith(new WordCountReducer()); Map<String, Integer> wordCountMap = t.execute(); } static class WordCountMapper implements Mapper<String,String,String,Integer> { /** The serialVersionUID */ private static final long serialVersionUID = -5943370243108735560L; @Override public void map(String key, String value, Collector<String, Integer> collector) { StringTokenizer tokens = new StringTokenizer(value); for(String token : value.split("\\w")) { collector.emit(token, 1); } } } static class WordCountReducer implements Reducer<String, Integer> { /** The serialVersionUID */ private static final long serialVersionUID = 1901016598354633256L; @Override public Integer reduce(String key, Iterator<Integer> iter) { int sum = 0; while (iter.hasNext()) { Integer i = (Integer) iter.next(); sum += i; } return sum; } } }
In this second example, a
Collator
is defined, which will transform the result of MapReduceTask Map<KOut,VOut> into a String that is returned to a task invoker. The Collator
is a transformation function applied to a final result of MapReduceTask.
Example 14.7. Defining the Collator
MapReduceTask<String, String, String, Integer> t = new MapReduceTask<String, String, String, Integer>(cache); t.mappedWith(new WordCountMapper()).reducedWith(new WordCountReducer()); String mostFrequentWord = t.execute( new Collator<String,Integer,String>() { @Override public String collate(Map<String, Integer> reducedResults) { String mostFrequent = ""; int maxCount = 0; for (Entry<String, Integer> e : reducedResults.entrySet()) { Integer count = e.getValue(); if(count > maxCount) { maxCount = count; mostFrequent = e.getKey(); } } return mostFrequent; } }); System.out.println("The most frequent word is " + mostFrequentWord);