DSA Hash Sets
Hash Sets
A Hash Set is a type of data structure similar to a Hash Table, although it typically has a lot of entries.
We can quickly search for, add, and remove entries when we use a hash set.
To determine whether an element is a part of a set, lookup is done using hash sets.
==== exa mukavu ====
A hash set uses the hash code of each element to store distinct elements in buckets.
- Hash code: A number that is produced using an element’s distinct value, or key, to identify which bucket the Hash Set element is in.
- Unique elements: No two elements with the identical value may exist in a hash set.
- Bucket: To hold items, a hash set is made up of numerous buckets, or containers. Two elements are considered to be in the same bucket if they share the same hash code. Since a bucket must be able to retain more than one element, the buckets are frequently implemented as arrays or linked lists.
Finding The Hash Code
A hash function produces a hash code.
In the animation above, the hash function takes the input name and adds up all of the Unicode code points for each character in the name.
Next, in order to obtain the hash code as a number between 0 and 9, the hash function performs a modulo 10 operation (% 10) on the character sum.
This indicates that a name’s hash code is used to place it into one of the ten buckets that are possible in the Hash Set. When we want to look up or remove a name from the Hash Set, the same hash code is generated and used.
If there is just one name in the associated bucket, we can obtain the hash code immediately.
Unicode code point: Every character on our computers is represented by a unique number, which is what’s known as the Unicode code point. For instance, Unicode code point 65 corresponds to the character A. Simply give the above scenario a try. For further details on how characters are represented as numbers, see this page.
Modulo: A mathematical operation; in most computer languages, represented as % (or modin the field of mathematics). A modulo operation yields the reminder that results from dividing a number by another number. Thus, for instance, 7% 3 will provide us with the reminder 1. (Dividing 7 apples among 3 individuals results in 2 apples for each person, plus 1 extra apple.)
Direct Access in Hash Sets
When we search for Peter in the aforementioned hash set, hash code 2 (512 % 10) is created, which points us in the direction of the bucket where Peter is located. We will locate Peter immediately if that is the sole name in that bucket.
In cases like this we say that the Hash Set has constant time O(1) for searching, adding, and removing elements, which is really fast.
But, if we search for Jens, we need to search through the other names in that bucket before we find Jens. In a worst case scenario, all names end up in the same bucket, and the name we are searching for is the last one. In such a worst case scenario the Hash Set has time complexity O(n), which is the same time complexity as arrays and linked lists.
It is crucial to have roughly the same number of buckets as Hash Set items and a hash function that will divide the elements equally among them in order to maintain Hash Sets’ speed.
Memory is wasted when there are many more buckets than Hash Set elements, and time is wasted when there are few buckets relative to Hash Set elements.
Hash Set Implementation
Although Python’s built-in set data type is usually used to create hash sets, we won’t utilize it here in order to better understand how hash sets function.
We construct a class called SimpleHashSet in Python to implement a hash set.
The methods for adding, containing, and removing hash sets are all contained in the SimpleHashSet class, along with a method called hash_function for the hash function.
In order to observe the Hash Set’s appearance more clearly, we also build the print_set method.
Example
class SimpleHashSet:
def __init__(self, size=100):
self.size = size
self.buckets = [[] for _ in range(size)] # A list of buckets, each is a list (to handle collisions)
def hash_function(self, value):
# Simple hash function: sum of character codes modulo the number of buckets
return sum(ord(char) for char in value) % self.size
def add(self, value):
# Add a value if it's not already present
index = self.hash_function(value)
bucket = self.buckets[index]
if value not in bucket:
bucket.append(value)
def contains(self, value):
# Check if a value exists in the set
index = self.hash_function(value)
bucket = self.buckets[index]
return value in bucket
def remove(self, value):
# Remove a value
index = self.hash_function(value)
bucket = self.buckets[index]
if value in bucket:
bucket.remove(value)
def print_set(self):
# Print all elements in the hash set
print("Hash Set Contents:")
for index, bucket in enumerate(self.buckets):
print(f"Bucket {index}: {bucket}")
We may construct the Hash Set seen at the top of this page by using the SimpleHashSet class:
Example
class SimpleHashSet:
def __init__(self, size=100):
self.size = size
self.buckets = [[] for _ in range(size)] # A list of buckets, each is a list (to handle collisions)
def hash_function(self, value):
# Simple hash function: sum of character codes modulo the number of buckets
return sum(ord(char) for char in value) % self.size
def add(self, value):
# Add a value if it's not already present
index = self.hash_function(value)
bucket = self.buckets[index]
if value not in bucket:
bucket.append(value)
def contains(self, value):
# Check if a value exists in the set
index = self.hash_function(value)
bucket = self.buckets[index]
return value in bucket
def remove(self, value):
# Remove a value
index = self.hash_function(value)
bucket = self.buckets[index]
if value in bucket:
bucket.remove(value)
def print_set(self):
# Print all elements in the hash set
print("Hash Set Contents:")
for index, bucket in enumerate(self.buckets):
print(f"Bucket {index}: {bucket}")
# Creating the Hash Set from the simulation
hash_set = SimpleHashSet(size=10)
hash_set.add("Charlotte")
hash_set.add("Thomas")
hash_set.add("Jens")
hash_set.add("Peter")
hash_set.add("Lisa")
hash_set.add("Adele")
hash_set.add("Michaela")
hash_set.add("Bob")
hash_set.print_set()
print("\n'Peter' is in the set:",hash_set.contains('Peter'))
print("Removing 'Peter'")
hash_set.remove('Peter')
print("'Peter' is in the set:",hash_set.contains('Peter'))
print("'Adele' has hash code:",hash_set.hash_function('Adele'))