Practical Algorithms Tier 0: Hash Map

Hash map, also known as hash table, is a fundamental and powerful data structure to save and retrieve an item in constant time.

Photo by John Matychuk on Unsplash

In reality at work, an engineer should never write their own hash map, due to the fact that a good hash function is non-trivial to derive. But during an interview, it is good to know the simplest toy implementation of a hash map.

A good practice example can be found at leetcode.com: 706. Design HashMap

Design a HashMap without using any built-in hash table libraries.Implement the MyHashMap class:
MyHashMap() initializes the object with an empty map.
void put(int key, int value) inserts a (key, value) pair into the HashMap. If the key already exists in the map, update the corresponding value.int get(int key) returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.void remove(key) removes the key and its corresponding value if the map contains the mapping for the key.

For a hash map implementation, the underlying storage is an array and there are three critical components to consider:

  1. Hash function

Hash Function

This is a deterministic mapping, which converts the key of an item into an index within the bound of the underlying array. For a toy implementation, modulo of the array size is a convenient choice. But modulo can not be used for production strength, because often the input keys are not evenly distributed which leads to very high collision rate.

Load Factor

The load factor should be at about 0.5 to avoid too much collisions, which means the total number of items stored should be about half the size of the array. This implies dynamic resizing of the underlying array. A good toy practice is to double the array size if the load factor reaches 0.5.

Collision resolution

Two common solutions: Separate chaining and open addressing.

Photo by Omer Rana on Unsplash
Separate Chaining

In the example the array size is 7, and the three items 38, 24 and 59 have a collision hashing to the same index because 38%7 == 24%7 == 59%7 == 3

When different items hash to the same index, separate chaining uses a linked list to store all the items, while open addressing does a linear probe and move to the next index until an empty slot is found.

Open Addressing

Sample code for open addressing

And the implementation with separate chaining is left as an exercise to the reader.

Q.E.D.

References

Digital World

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store