Prerequisite - a) Java 1.6
b) Hadoop (1.2.1) is installed in pseudo mode.
b) Hadoop (1.2.1) is installed in pseudo mode.
CountDriver (Driver class) -
public class CountDriver {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Job job = new Job(new Configuration(), "Count Driver");
job.setJarByClass(CountDriver.class);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:8020//user/chatar/input/practice/count_max.txt"));
FileOutputFormat.setOutputPath(job, new Path(FileNameUtil.HDFS_OUT_DIR +"/" +Calendar.getInstance().getTimeInMillis()));
job.setMapperClass(CountMapper.class);
job.setReducerClass(CountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
CountMapper
public class CountMapper extends Mapper {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
final String[] values = value.toString().split(" ");
final Map map = new HashMap();
for(String val : values) {
if(val != null && !val.isEmpty()) {
context.write(new Text(val), new IntWritable(1));
}
}
}
}
CountReducer
public class CountReducer extends Reducer {
public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
int total = 0;
for(IntWritable value : values) {
total = total+value.get();
}
context.write(key, new IntWritable(total));
}
}
Input file (count_max.txt) -
g g g h j k l
b n g f h j j
a a w e r g h
t y u i o p p
d f g h j k l
l m n k k k k
Output file -
a 2
b 1
d 1
e 1
f 2
g 6
h 4
i 1
j 4
k 6
l 3
m 1
n 2
o 1
p 2
r 1
t 1
u 1
w 1
y 1