Sunday, December 29, 2013

Hadoop simple program to count occurrence of a character in a file

Prerequisite - a) Java 1.6
                      b) Hadoop (1.2.1) is installed in pseudo mode.


CountDriver (Driver class) -

public class CountDriver {

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Job job = new Job(new Configuration(), "Count Driver");
job.setJarByClass(CountDriver.class);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:8020//user/chatar/input/practice/count_max.txt"));
FileOutputFormat.setOutputPath(job, new Path(FileNameUtil.HDFS_OUT_DIR +"/" +Calendar.getInstance().getTimeInMillis()));
job.setMapperClass(CountMapper.class);
job.setReducerClass(CountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

}

CountMapper

public class CountMapper extends Mapper {

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
final String[] values = value.toString().split(" ");
final Map map = new HashMap();
for(String val : values) {
if(val != null && !val.isEmpty()) {
context.write(new Text(val), new IntWritable(1));
}
}
}
}

CountReducer

public class CountReducer extends Reducer {

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
int total = 0;
for(IntWritable value : values) {
total = total+value.get();
}
context.write(key, new IntWritable(total));
}
}

Input file (count_max.txt) -

g g g h j k l 
b n g f h j j
a a w e r g h
t y u i o p p
d f g h j k l
l m n k k k k 

Output file -

a 2
b 1
d 1
e 1
f 2
g 6
h 4
i 1
j 4
k 6
l 3
m 1
n 2
o 1
p 2
r 1
t 1
u 1
w 1
y 1