Search This Blog

Tuesday 1 April 2014

How HashMap works in Java

I was looking at the HashMap source code the other day and felt like writing down what I learned. Consider the start point to using the HashMap:
Map<String, Integer> map = new HashMap<String, Integer>();
This will Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).

The instance created in Memory would be something like this:
 There are also static final constants defined as a part of the HashMap class which will be loaded in the memory:
When we call the put method, the HashMap:
  1. Calculates the hash for the key. If the key is null, it sets the hash to 0.
  2. Calculate the bucket index based on the hash.
  3. Within the bucket check the entries to see if the key occurs.
  4. If the key exists in an entry, than replace the value in entry with existing key - No change in size. Return the previous value against the key.
  5. If no matching key found than check if a resize is needed. If yes than accordingly perform the resize operation.
  6. Add the entry for this new key value pair to the correct bucket. Increase the value of size by 1.
How is the bucket for a key calculated ?
Hashing concepts suggest that the hash be generated as unique as possible for every key. But the HashMap has a limited set of buckets within which it needs to place these key-value pairs.
static int indexFor(int h, int length) {
    return h & (length-1);
}
 // h - hash 
 // length - length of table
The and operation will ensure that the index does not exceed the length of our table - i.e. the number of buckets.
If the length was any random value like 10, than the and operator would result in a very uneven distribution. But the HashMap enforces that the number of buckets or length be powers of 2.
This is equivalent to a modulus operation on the length.Very smart coding by the guys at Sun.

Inserting new entries in a bucket:
The bucket is simply a set of keys. Not a list. A new key could be added anywhere in the bucket. The order does not matter. A bad program would probably iterate through the bucket list, adding the key at the end. The HashMap code very smartly appends the new value right at the beginning.
 

When does resizing of the map happen ? 
Every time a new entry is to be inserted, a check is done  to ensure if a resize is needed.
void addEntry(int hash, K key, V value, int bucketIndex) {
   if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
//...other code
}
The threshold value here is = capacity * load factor. The length is doubled to twice the size.Once the value is doubled, the capacity of keys for every bucket is increased. So every entry has its bucket location recomputed and the entry is placed into a new bucket accordingly.

Retrieving a value from the Map for a given key
For a given key,the map computes the hash. With the hash, it then computes the bucket where the key could exists. The final step is to iterate through the bucket and locate the entry with matching key if it exists.
final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null; e = e.next) {
         Object k;
         if (e.hash == hash &&
             ((k = e.key) == key || (key != null && key.equals(k))))
              return e;
     }
     return null;
}
First a hash match and then a key match. As seen here the equals comparison has been very nicely optimized.Maximum n comparisons for the n elements.

6 comments:

  1. Nice Job robin.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. very simple explination on How HashMap works in Java
    to stores the data in the form of key-value pairs and find the exact index from bucket for key value pair.

    ReplyDelete