Share knowledge and thoughts

Wednesday, January 1, 2014

Hadoop questions and answers

Q. What is the default block size in HDFS.
A. 64 mb

Q. What is the difference between block and split.
A. Block is unit of continuous memory space in HDFS where as split is logical set process by one map task, generally block and split size are same and could differ in cases where a record fall between two splits in that case the records is considered as part of the first split.

Q. How can you make sure that you don't split a file.
A. You need to write your own InputFormat class and override isSplittable and return always 'false'.

Q.When a reducer method of a reducer get called.
A. Not until all mapper finished processing their input.

Q. What if output of your mapper does not match with reducer input.
A. Job will fail with ClassCastException at run time.

Q. Are keys and values in sorted order when passed to reducer?
A. Keys are sorted but values are not.

Q.Where intermediate data emitted from mapper get written?
A. Local file system of the node where mapper is running.

Q.What's the default replication factor.
A. 3

Q. How Hadoop decide where/how to store replicated data?
A. Data block 'd' stored in 3 different nodes n1, n2, n3 (assuming replication factor 3), under two different racks r1, r2.

Q. Can you configure the number of mappers for your input file?
A. You can configure how many mapper will run in parallel under a node but you can't configure total number of mappers as its decided by number of splits (and ultimately by block size), so by changing the block size when can control the number of mappers.

Sunday, December 29, 2013

Hadoop simple program to count occurrence of a character in a file

Prerequisite - a) Java 1.6
b) Hadoop (1.2.1) is installed in pseudo mode.

CountDriver (Driver class) -

public class CountDriver {

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

Job job = new Job(new Configuration(), "Count Driver");

job.setJarByClass(CountDriver.class);

FileInputFormat.addInputPath(job, new Path("hdfs://localhost:8020//user/chatar/input/practice/count_max.txt"));

FileOutputFormat.setOutputPath(job, new Path(FileNameUtil.HDFS_OUT_DIR +"/" +Calendar.getInstance().getTimeInMillis()));

job.setMapperClass(CountMapper.class);

job.setReducerClass(CountReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

CountMapper

public class CountMapper extends Mapper {

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

final String[] values = value.toString().split(" ");

final Map map = new HashMap();

for(String val : values) {

if(val != null && !val.isEmpty()) {

context.write(new Text(val), new IntWritable(1));

}

CountReducer

public class CountReducer extends Reducer {

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

int total = 0;

for(IntWritable value : values) {

total = total+value.get();

}

context.write(key, new IntWritable(total));

}

Input file (count_max.txt) -

g g g h j k l

b n g f h j j

a a w e r g h

t y u i o p p

d f g h j k l

l m n k k k k

Output file -

a 2

b 1

d 1

e 1

f 2

g 6

h 4

i 1

j 4

k 6

l 3

m 1

n 2

o 1

p 2

r 1

t 1

u 1

w 1

y 1

Monday, December 24, 2012

Java Interview Questions

Collections -

Explain how HashMap works

HashMap - used array as an internal data structure, where each array position is treated as bucket and elements are stored in buckets using linked list.

Adding into HashMap -
1. Call hashcode method on the key and then use its own hashing function to get a hashcode (This is to make sure that its hashcode failry distribute bukets)
2. Use a mode operation on hashcode with size of array and find out bucket location.
3. Traverse all the elements in the bucket and check if key matches using equals method on keys
4. if key matches update the value else add new entry.

similar steps for delete and retrieve.

Perl how to find a exact word from all files under current directory (One level search only)

# Find out where all you see hello word in files
use warnings;
use strict;

my @files = <*.*>;

foreach my $file(@files) {
    if(-e -f $file) {
        open my $file_handler, '<, $file;
        while(<$file_handler>) {
            if(/\A(hello)\z / ) {
                print $1;
}
        }
       close $file_handle;
    }
}

Saturday, November 3, 2012

How to implement your own Map

This post is to show how you can implement a simple Map with only Put, Get and Size operations. This map implementation does not take care of any thread safety and just here to illustrate what all may needed to implement a map kind of data structure.

What all needed -

eclispe -Juno
JDK 1.7
Junit 4.5
Hamcrest-all-1.1.jar

Test First -

package com.chatar.practice;

import org.junit.Assert;
import org.junit.Test;
import static org.hamcrest.CoreMatchers.*;

import com.chatar.pratice.MyMap;

public class MyMapTest {

    @Test(expected=NullPointerException.class)
    public void shouldThrowExceptionIfInsertingNullKey() {
        MyMap myMap = new MyMap();
        myMap.put(null, "some_value");
    }

    @Test(expected=NullPointerException.class)
    public void shouldThrowExceptionIfGetingValueForNullKey() {
        MyMap myMap = new MyMap();
        myMap.get(null);
    }

    @Test
    public void shouldAbleToPutValues() {
        MyMap myMap = new MyMap();
        myMap.put("key1", "value1");
        myMap.put("key2", "value2");
        Assert.assertThat(myMap.size(), is(2));
    }

    @Test
    public void shouldReturnNullIfKeyNotFound() {
        MyMap myMap = new MyMap();
        Assert.assertThat(myMap.get("key1"), nullValue());
    }

    @Test
    public void shouldOverrideValueIfKeyIsUnique() {
        MyMap myMap = new MyMap();
        myMap.put("key1", "value1");
        Assert.assertThat(myMap.size(), is(1));
        Assert.assertThat(myMap.get("key1"), is("value1"));
        myMap.put("key1", "value2");
        Assert.assertThat(myMap.size(), is(1));
        Assert.assertThat(myMap.get("key1"), is("value2"));

    }

    @Test
    public void shouldGetValueByPassingKey() {
        MyMap myMap = new MyMap();
        myMap.put("key1", "value1");
        myMap.put("key2", "value2");
        Assert.assertThat(myMap.size(), is(2));
        Assert.assertThat(myMap.get("key1"), is("value1"));
        Assert.assertThat(myMap.get("key2"), is("value2"));
    }
}

And Implementation -

package com.chatar.pratice;

public class MyMap {

    private Entry[] backets;
    private int size = 0;

    public MyMap() {}{
        backets = new Entry[128];
    }

    public void put(K key, V value) {
        validate(key);
        Entry entry = backets[backet(key)];
        if(entry != null) {
            addTo(entry, key, value);
        } else {
            backets[backet(key)] = new Entry(key, value);
        }
        size++;
    }

    public V get(K key) {
        validate(key);
        Entry entry = backets[backet(key)];
        while(entry != null && !key.equals(entry.key)) {
            entry = entry.next;
        }
        return entry != null ? entry.value : null;
    }

    public int size() {
        return size;
    }

    private void validate(K key) {
        if(key == null) {
            throw new NullPointerException("Key can't be null");
        }
    }

    private void addTo(Entry entry, K key, V value) {
        boolean notFound = true;
        while(notFound) {
            if(entry.hasNext()) {
                if(entry.key.equals(key)) {
                    entry.value = value;
                    notFound = false;
                    size--;
                }
            }
            else if (entry.key.equals(key)) {
                entry.value = value;
                notFound = false;
                size--;
            }
        }
    }

    private int backet(K key) {
        return key.hashCode() % backets.length;
    }

    static class Entry {
        K key;
        V value;
        Entry next;

        public Entry(K key, V value) {
            this.key = key;
            this.value = value;
        }

        public Entry next() {
            return next;
        }

        public boolean hasNext() {
            return next != null;
        }
    }
}

Sunday, August 5, 2012

After long time I am back to my blog.

It took me almost two years to realized how important is the blog post and more important is how to keep it going.. I always had hard time in writing specially how to express my self. To make thing easy I promised my self to write a small post about bi-weekly basis on what I have learned

I have read recently clean coder - A handbook of Agile software craftsmanship written by Robert C. Martin and now I know what it takes to become a professional programmer.

The earlier book Clean code was all about code and clean coder was all about about profession. I felt both the books are two side of a coin and a professional programmer is who write clean code and a professional at the same time.

I recommended the book to two of my colleagues also and I would recommend to everyone who care about professionalism. Thanks to uncle bob for such a simple yet powerful book on professionalism.

Sunday, December 27, 2009

Design Patterns - WWW (What/ When / Why)

What is a design pattern? - Someone has already solved your problem - (Head First Design Pattern).
When should we use it? - Depend upon what kind of problem you want to solve .. there might be already a mature solution (in design) for the problem.. so that means you need to spend some time understanding existing design patterns and where you should apply them.. if you think none of the existing design pattern suits to your problem then you can share your experience with the community and you never know .. might be the next design pattern belongs to you..
Why should we use it? As already said - Someone has already solved your problem.. and when you apply it your code it make self explanatory . so that if some one else is looking to your code can easily make what kind of problem you are trying to solve.. ah you got free documentation...