DSA Hash Tables
Hash Table
One type of data structure meant to be worked with quickly is a hash table.
Because hash tables allow for fast searching, addition, and deletion of data—even for enormous amounts of data—they are occasionally chosen over arrays or linked lists.
Finding a person named “Bob” in a linked list requires time-consuming navigation between nodes, which must be checked one after the other until the node containing “Bob” is located.
Additionally, locating “Bob” in an array could be quick if we knew the index; however, if we only know “Bob,” we must compare each element, which takes time (much like with linked lists).
However, locating “Bob” is incredibly quick using a hash table since a hash function may be used to go to the exact location of “Bob”‘s storage.
Building A Hash Table from Scratch
Let’s attempt to create a hash table from scratch and store unique first names inside of it to get a sense of what one is.
The Hash Set will be constructed in five steps:
1.Initially, an array is used.
2.names being stored via a hash algorithm.
3.use a hash function to look up an element.
4.Managing crashes.
5.The simulation and illustration of basic hash set code.
Step 1: Starting with an array
We could store names like this in an array:
my_array = ['Pete', 'Jones', 'Lisa', 'Bob', 'Siri']
We must compare each name, element by element, until we locate “Bob” in this array.
If the array was arranged alphabetically, we could quickly locate a name using Binary Search; however, adding or removing names from the array would require a significant amount of memory shifting.
Instead, let’s use a Hash Set, which is a condensed form of a Hash Table, to make interacting with the list of names extremely quick.
Let’s assume for simplicity’s sake that the list contains no more than 10 names; so, the array’s size must be fixed at 10. Every single one of these components is referred to as a bucket when discussing hash tables.
my_hash_set = [None,None,None,None,None,None,None,None,None,None]
Step 2: Storing names using a hash function
This brings us to the unique manner we work with the hash set we are creating.
The hash function is used when we wish to save a name exactly where it belongs in the array.
The person who creates the hash table can choose how to create a hash function. One popular method is to figure out how to change the value into a number, in this example a number between 0 and 9, that equals one of the index numbers of the Hash Set. In this example, we will use each character’s Unicode number, sum them together, then perform a modulo 10 operation to obtain the index numbers 0-9.
Example
def hash_function(value):
sum_of_chars = 0
for char in value:
sum_of_chars += ord(char)
return sum_of_chars % 10
print("'Bob' has hash code:",hash_function('Bob'))
Among Unicode code points, “o” has 111, “b” has 98, and “B” has 66. When we sum those up, we get 275. Since 5 is modulo 10 of 275, “Bob” ought to be kept as an array element at index 5.
The hash code is the number that the hash algorithm returns.
Unicode code point: Every character on our computers is assigned a unique number, which is known as a Unicode code point. For instance, Unicode number 65 (also known as Unicode code point) corresponds to the character A. Simply give the simulation below a try. For further details on how characters are represented as numbers, see this page.
Modulo: A mathematical operation; in most computer languages, represented as % (or mod in the field of mathematics). A modulo operation yields the reminder that results from dividing a number by another number. Thus, for instance, 7% 3 will provide us with the reminder 1. (Dividing 7 apples among 3 individuals results in 2 apples for each person, plus 1 extra apple.)
After storing “Bob” at index 5, which is where the hash code indicates, our array now appears as follows:
my_hash_set = [None,None,None,None,None,'Bob',None,None,None,None]
To determine where to keep the additional names “Pete”, “Jones”, “Lisa”, and “Siri”, we may also utilize the hash function.
Our array looks like this after utilizing the hash function to save those names in the appropriate location:
my_hash_set = [None,'Jones',None,'Lisa',None,'Bob',None,'Siri','Pete',None]
Step 3: Looking up a name using a hash function
Now that we have created a very basic hash set, we can use the hash function to quickly access the desired element rather than having to go through the array element by element to see if “Pete” is present.
In order to determine whether “Pete” is present in the array, we first pass the name “Pete” through our hash function, which returns a hash code of 9. We then go straight to the element at index 9, where we locate him. We looked through all the components and found “Pete”.
Example
my_hash_set = [None,'Jones',None,'Lisa',None,'Bob',None,'Siri','Pete',None]
def hash_function(value):
sum_of_chars = 0
for char in value:
sum_of_chars += ord(char)
return sum_of_chars % 10
def contains(name):
index = hash_function(name)
return my_hash_set[index] == name
print("'Pete' is in the Hash Set:",contains('Pete'))
We can also use the hash function to locate the name directly and set its element value to None when removing it from our hash collection.
Step 4: Handling collisions
Let’s include “Stuart” in our hash set as well.
Our hash algorithm accepts “Stuart” as input, and returns the hash code 3, indicating that “Stuart” ought to be kept at index 3.
A collision occurs when “Stuart” tries to be saved because “Lisa” is already stored at index 3.
In order to resolve the collision, we can add more pieces to the same bucket; this method of resolving the collision is known as chaining. One way to expand the number of elements in a bucket is to implement each bucket as an array or linked list.
“Stuart” can also be saved at index 3 once each bucket is implemented as an array to allow for possibly more than one name in each bucket. As a result, our hash set now looks like this:
my_hash_set = [
[None],
['Jones'],
[None],
['Lisa', 'Stuart'],
[None],
['Bob'],
[None],
['Siri'],
['Pete'],
[None]
]
By utilizing the hash function to search for “Stuart” in our hash set, we can now discover “Stuart” as the second element in bucket 3 without having to first check for “Lisa” in that bucket.
Step 5: Hash Set code example and simulation
Let’s add and search functions for names in the Hash Set, which is now a two-dimensional array, to finish our very simple Hash Set code.
Try the following code example with various values to gain a better idea of how a hash set functions.
Example
my_hash_set = [
[None],
['Jones'],
[None],
['Lisa'],
[None],
['Bob'],
[None],
['Siri'],
['Pete'],
[None]
]
def hash_function(value):
return sum(ord(char) for char in value) % 10
def add(value):
index = hash_function(value)
bucket = my_hash_set[index]
if value not in bucket:
bucket.append(value)
def contains(value):
index = hash_function(value)
bucket = my_hash_set[index]
return value in bucket
add('Stuart')
print(my_hash_set)
print('Contains Stuart:',contains('Stuart'))
Better and more thorough implementations of hast sets and hash tables are shown on the following two pages.
To have a better understanding of how a hash set functions in theory, try the hash set simulation down below.
==== BOX MUKAVU =====
Uses of Hash Tables
Why hash tables work so well:
1.Verifying whether an item is part of a collection (for as locating a book in a library).
2.keeping special things and finding them fast (e.g., phone numbers).
3.associating keys to values in the same way that phone numbers and names are linked.
The most important reason why Hash Tables are great for these things is that Hash Tables are very fast compared Arrays and Linked Lists, especially for large sets. Arrays and Linked Lists have time complexity O(n) for search and delete, while Hash Tables have just O(1) on average! Read more about time complexity here.
Hash Set vs. Hash Map
A hash set or hash map can be a hash table. These data structures are covered in further detail in the following two pages.
Here are some distinctions and similarities between hash sets and hash maps:
Hash Set | Hash Map | |
---|---|---|
Uniqueness and storage | Every element is a unique key. | Every entry is a key-value-pair, with a key that is unique, and a value connected it. |
Use case | Checking if an element is in the set, like checking if a name is on a guest list. | Finding information based on a key, like looking up who owns a certain telephone number. |
Is it fast to search, add and delete elements? | Yes, average O(1)O(1). | Yes, average O(1)O(1). |
Is there a hash function that takes the key, generates a hash code, and that is the bucket where the element is stored? | Yes | Yes |
Hash Tables Summarized
Elements of hash tables are kept in storage units known as buckets.
Each element in a hash table contains a unique component known as the key.
A hash function creates a hash code using an element’s key.
Now that we know which bucket the element belongs to thanks to the hash code, we can navigate directly to the Hash Table element and change, remove, or simply verify its existence. The following two pages provide a detailed explanation of particular hash functions.
When two elements in a hash table have the same hash code, they are considered to be in the same bucket and will collide. Two methods are available to solve a collision.
In this tutorial, collisions are resolved by chaining, which permits several elements in a single bucket by using arrays or linked lists.
Alternatively, collisions can be resolved via open addressing. When using open addressing, an element is saved in the next bucket that becomes accessible if we wish to store it but there is already an element in that bucket. There are other ways to accomplish this, but we won’t go into greater detail on open addressing at this time.
Conclusion
Programming’s Hash Tables are strong tools that facilitate effective data management and access.
Depending on what you need—to find out in-depth information about something or merely to know if it exists—you can use either a hash set or a hash map.