Hash table is a data structure that represents data in the form of **key-value** pairs. Each key is mapped to a value in the hash table. The keys are used for indexing the values/data. A similar approach is applied by an associative array.

Data is represented in a key value pair with the help of keys as shown in the figure below. Each data is associated with a key. The key is an integer that point to the data.

## 1. Direct Address Table

Direct address table is used when the amount of space used by the table is not a problem for the program. Here, we assume that

- the keys are small integers
- the number of keys is not too large, and
- no two data have the same key

A pool of integers is taken called universe `U = {0, 1, ……., n-1}`

.

Each slot of a direct address table `T[0...n-1]`

contains a pointer to the element that corresponds to the data.

The index of the array `T`

is the key itself and the content of `T`

is a pointer to the set `[key, element]`

. If there is no element for a key then, it is left as `NULL`

.

Sometimes, the key itself is the data.

**Pseudocode for operations**

```
directAddressSearch(T, k)
return T[k]
directAddressInsert(T, x)
T[x.key] = x
directAddressDelete(T, x)
T[x.key] = NIL
```

**Limitations of a Direct Address Table**

- The value of the key should be small.
- The number of keys must be small enough so that it does not cross the size limit of an array.

## 2. Hash Table

In a hash table, the keys are processed to produce a new index that maps to the required element. This process is called hashing.

Let `h(x)`

be a hash function and `k`

be a key.

`h(k)`

is calculated and it is used as an index for the element.

**Limitations of a Hash Table**

- If the same index is produced by the hash function for multiple keys then, conflict arises. This situation is called collision.

To avoid this, a suitable hash function is chosen. But, it is impossible to produce all unique keys because`|U|>m`

. Thus a good hash function may not prevent the collisions completely however it can reduce the number of collisions.

However, we have other techniques to resolve collision.

**Advantages of hash table over direct address table:**

The main issues with direct address table are the size of the array and the possibly large value of a key. The hash function reduces the range of index and thus the size of the array is also reduced.

For example, If `k = 9845648451321`

, then `h(k) = 11`

(by using some hash function). This helps in saving the memory wasted while providing the index of `9845648451321`

to the array

### Collision resolution by chaining

In this technique, if a hash function produces the same index for multiple elements, these elements are stored in the same index by using a doubly linked list.

If `j`

is the slot for multiple elements, it contains a pointer to the head of the list of elements. If no element is present, `j`

contains `NIL`

.

**Pseudocode for operations**

```
chainedHashSearch(T, k)
return T[h(k)]
chainedHashInsert(T, x)
T[h(x.key)] = x //insert at the head
chainedHashDelete(T, x)
T[h(x.key)] = NIL
```

## Python, Java, C and C++ Implementation

```
# Python program to demonstrate working of HashTable
hashTable = [[],] * 10
def checkPrime(n):
if n == 1 or n == 0:
return 0
for i in range(2, n//2):
if n % i == 0:
return 0
return 1
def getPrime(n):
if n % 2 == 0:
n = n + 1
while not checkPrime(n):
n += 2
return n
def hashFunction(key):
capacity = getPrime(10)
return key % capacity
def insertData(key, data):
index = hashFunction(key)
hashTable[index] = [key, data]
def removeData(key):
index = hashFunction(key)
hashTable[index] = 0
insertData(123, "apple")
insertData(432, "mango")
insertData(213, "banana")
insertData(654, "guava")
print(hashTable)
removeData(123)
print(hashTable)
```

```
// Java program to demonstrate working of HashTable
import java.util.*;
class HashTable {
public static void main(String args[])
{
Hashtable<Integer, Integer>
ht = new Hashtable<Integer, Integer>();
ht.put(123, 432);
ht.put(12, 2345);
ht.put(15, 5643);
ht.put(3, 321);
ht.remove(12);
System.out.println(ht);
}
}
```

```
// Implementing hash table in C
#include <stdio.h>
#include <stdlib.h>
struct set
{
int key;
int data;
};
struct set *array;
int capacity = 10;
int size = 0;
int hashFunction(int key)
{
return (key % capacity);
}
int checkPrime(int n)
{
int i;
if (n == 1 || n == 0)
{
return 0;
}
for (i = 2; i < n / 2; i++)
{
if (n % i == 0)
{
return 0;
}
}
return 1;
}
int getPrime(int n)
{
if (n % 2 == 0)
{
n++;
}
while (!checkPrime(n))
{
n += 2;
}
return n;
}
void init_array()
{
capacity = getPrime(capacity);
array = (struct set *)malloc(capacity * sizeof(struct set));
for (int i = 0; i < capacity; i++)
{
array[i].key = 0;
array[i].data = 0;
}
}
void insert(int key, int data)
{
int index = hashFunction(key);
if (array[index].data == 0)
{
array[index].key = key;
array[index].data = data;
size++;
printf("\n Key (%d) has been inserted \n", key);
}
else if (array[index].key == key)
{
array[index].data = data;
}
else
{
printf("\n Collision occured \n");
}
}
void remove_element(int key)
{
int index = hashFunction(key);
if (array[index].data == 0)
{
printf("\n This key does not exist \n");
}
else
{
array[index].key = 0;
array[index].data = 0;
size--;
printf("\n Key (%d) has been removed \n", key);
}
}
void display()
{
int i;
for (i = 0; i < capacity; i++)
{
if (array[i].data == 0)
{
printf("\n array[%d]: / ", i);
}
else
{
printf("\n key: %d array[%d]: %d \t", array[i].key, i, array[i].data);
}
}
}
int size_of_hashtable()
{
return size;
}
int main()
{
int choice, key, data, n;
int c = 0;
init_array();
do
{
printf("1.Insert item in the Hash Table"
"\n2.Remove item from the Hash Table"
"\n3.Check the size of Hash Table"
"\n4.Display a Hash Table"
"\n\n Please enter your choice: ");
scanf("%d", &choice);
switch (choice)
{
case 1:
printf("Enter key -:\t");
scanf("%d", &key);
printf("Enter data -:\t");
scanf("%d", &data);
insert(key, data);
break;
case 2:
printf("Enter the key to delete-:");
scanf("%d", &key);
remove_element(key);
break;
case 3:
n = size_of_hashtable();
printf("Size of Hash Table is-:%d\n", n);
break;
case 4:
display();
break;
default:
printf("Invalid Input\n");
}
printf("\nDo you want to continue (press 1 for yes): ");
scanf("%d", &c);
} while (c == 1);
}
```

```
// Implementing hash table in C++
#include <iostream>
#include <list>
using namespace std;
class HashTable
{
int capacity;
list<int> *table;
public:
HashTable(int V);
void insertItem(int key, int data);
void deleteItem(int key);
int checkPrime(int n)
{
int i;
if (n == 1 || n == 0)
{
return 0;
}
for (i = 2; i < n / 2; i++)
{
if (n % i == 0)
{
return 0;
}
}
return 1;
}
int getPrime(int n)
{
if (n % 2 == 0)
{
n++;
}
while (!checkPrime(n))
{
n += 2;
}
return n;
}
int hashFunction(int key)
{
return (key % capacity);
}
void displayHash();
};
HashTable::HashTable(int c)
{
int size = getPrime(c);
this->capacity = size;
table = new list<int>[capacity];
}
void HashTable::insertItem(int key, int data)
{
int index = hashFunction(key);
table[index].push_back(data);
}
void HashTable::deleteItem(int key)
{
int index = hashFunction(key);
list<int>::iterator i;
for (i = table[index].begin();
i != table[index].end(); i++)
{
if (*i == key)
break;
}
if (i != table[index].end())
table[index].erase(i);
}
void HashTable::displayHash()
{
for (int i = 0; i < capacity; i++)
{
cout << "table[" << i << "]";
for (auto x : table[i])
cout << " --> " << x;
cout << endl;
}
}
int main()
{
int key[] = {231, 321, 212, 321, 433, 262};
int data[] = {123, 432, 523, 43, 423, 111};
int size = sizeof(key) / sizeof(key[0]);
HashTable h(size);
for (int i = 0; i < n; i++)
h.insertItem(key[i], data[i]);
h.deleteItem(12);
h.displayHash();
}
```

## Good Hash Functions

A good hash function has the following characteristics.

- It should not generate keys that are too large and the bucket space is small. Space is wasted.
- The keys generated should be neither very close nor too far in range.
- The collision must be minimized as much as possible.

Some of the methods used for hashing are:

### Division Method

If `k`

is a key and `m`

is the size of the hash table, the hash function `h()`

is calculated as:

`h(k) = k mod m`

For example, If the size of a hash table is `10`

and `k = 112`

then `h(k) = 112`

mod `10 = 2`

. The value of `m`

must not be the powers of `2`

. This is because the powers of `2`

in binary format are `10, 100, 1000, …`

. When we find `k mod m`

, we will always get the lower order p-bits.

if m = 22, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 01 if m = 23, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 001 if m = 24, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 0001 if m = 2p, then h(k) = p lower bits of m

### Multiplication Method

`h(k) = ⌊m(kA mod 1)⌋`

where,

`kA mod 1`

gives the fractional part`kA`

,`⌊ ⌋`

gives the floor value`A`

is any constant. The value of`A`

lies between 0 and 1. But, an optimal choice will be`≈ (√5-1)/2`

suggested by Knuth.

### Universal Hashing

In Universal hashing, the hash function is chosen at random independent of keys.

## Open Addressing

Multiple values can be stored in a single slot in a normal hash table.

By using open addressing, each slot is either filled with a single key or left `NIL`

. All the elements are stored in the hash table itself.

Unlike chaining, multiple elements cannot be fit into the same slot.

Open addressing is basically a collision resolving technique. Some of the methods used by open addressing are:

### Linear Probing

In linear probing, collision is resolved by checking the next slot.

`h(k, i) = (h′(k) + i) mod m`

where,

`i = {0, 1, ….}`

`h'(k)`

is a new hash function

If a collision occurs at `h(k, 0)`

, then `h(k, 1)`

is checked. In this way, the value of `i`

is incremented linearly.

The problem with linear probing is that a cluster of adjacent slots is filled. When inserting a new element, the entire cluster must be traversed. This adds to the time required to perform operations on the hash table.

### Quadratic Probing

In quadratic probing, the spacing between the slots is increased (greater than one) by using the following relation.

`h(k, i) = (h′(k) + c`

_{1}i + c_{2}i^{2}) mod m

where,

`c`

and_{1}`c`

are positive auxiliary constants,_{2}`i = {0, 1, ….}`

### Double hashing

If a collision occurs after applying a hash function `h(k)`

, then another hash function is calculated for finding the next slot.

`h(k, i) = (h`

_{1}(k) + ih_{2}(k)) mod m

## Hash Table Applications

Hash tables are implemented where

- constant time lookup and insertion is required
- cryptographic applications
- indexing data is required