Share knowledge and thoughts: Hadoop simple program to count occurrence of a character in a file

Prerequisite - a) Java 1.6
b) Hadoop (1.2.1) is installed in pseudo mode.

CountDriver (Driver class) -

public class CountDriver {

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

Job job = new Job(new Configuration(), "Count Driver");

job.setJarByClass(CountDriver.class);

FileInputFormat.addInputPath(job, new Path("hdfs://localhost:8020//user/chatar/input/practice/count_max.txt"));

FileOutputFormat.setOutputPath(job, new Path(FileNameUtil.HDFS_OUT_DIR +"/" +Calendar.getInstance().getTimeInMillis()));

job.setMapperClass(CountMapper.class);

job.setReducerClass(CountReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

CountMapper

public class CountMapper extends Mapper {

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

final String[] values = value.toString().split(" ");

final Map map = new HashMap();

for(String val : values) {

if(val != null && !val.isEmpty()) {

context.write(new Text(val), new IntWritable(1));

}

CountReducer

public class CountReducer extends Reducer {

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

int total = 0;

for(IntWritable value : values) {

total = total+value.get();

}

context.write(key, new IntWritable(total));

}

Input file (count_max.txt) -

g g g h j k l

b n g f h j j

a a w e r g h

t y u i o p p

d f g h j k l

l m n k k k k

Output file -

a 2

b 1

d 1

e 1

f 2

g 6

h 4

i 1

j 4

k 6

l 3

m 1

n 2

o 1

p 2

r 1

t 1

u 1

w 1

y 1

Share knowledge and thoughts

Sunday, December 29, 2013

Hadoop simple program to count occurrence of a character in a file

No comments:

Blog Archive

About Me