3. 使用 MapReduce 实现词频统计
概述
MapReduce 是 Hadoop 用于处理大规模数据的核心编程模型。本文将通过 MapReduce 代码实现简单的词频统计任务。
内容
MapReduce 工作原理:Mapper 和 Reducer
Hadoop 项目结构
MapReduce 程序代码
代码示例
public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
} public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0; for (IntWritable val : values) { sum += val.get();
} context.write(key, new IntWritable(sum)); } } }
标签:IntWritable,Text,MapReduce,9.11,context,new,public
From: https://www.cnblogs.com/kongxiangzeng/p/18632539