next up previous index
Next: Collision resolution Up: Hash Tables Previous: Hash Tables   Index


Hashing

The ideal aim of the hash table is to achieve access in constant time. Suppose it were possible to compute, either directly or indirectly, the address where an item would be stored if it were present in the data structure. Assuming the address could be calculated independently of the number of items in the data structure, this would give us access in constant time, either for retrieval or for testing for the item's presence.

Suppose the items are stored in an array for example. Then it would be sufficient to be able to calculate the array index where the item would be located if it were present. Addition of a fixed offset would then provide us with the actual address. What we are looking for is a function which associates an integer with every data item of interest. This is called the hash code of the object, and the process of forming it is called hashing. The hash code is understood to determine the proper place of the data item in the underlying array, which is now to be called the hash table.

Different items should hash to different integers. Although theoretically possible, given unlimited storage space, this aim is generally unrealistic in practice. For example, suppose we have to store strings of up to 16 characters in length. Even assuming that only the lower case characters a..z can occur in the string, there are still

\begin{displaymath}
26^{16} = 43608742899428874059776
\end{displaymath}

possible strings to be stored. Each would have to be mapped to a different array index, so the array would have to be at least this size.

Nonetheless there is a useful idea here, provided we are willing to abandon the requirement that different data items always have different hash codes and map to different elements of the array. It might be sufficient if the most commonly occurring data items usually map to different indexes, provided we can devise a suitable way of dealing with the (hopefully) exceptional cases. The problem of dealing with these cases, where two objects to be inserted in the hash table initially map to the same array index, is called the problem of collision resolution. Note that we aim to have chosen a ``good'' hash function that will minimise the cost of overcoming this problem.


next up previous index
Next: Collision resolution Up: Hash Tables Previous: Hash Tables   Index
Peter Williams 2005-06-07