Sorting Algorithms and Hashing Techniques
Sorting
Stable vs In-Place
Stable: The relative order of elements with the same key value is preserved by the algorithm.
If after the first sort, an element is at its final position and subsequent iterations do not change its position, it is considered stable.
In-Place: Requires only a constant amount, i.e., O(1), of extra space during the sorting process.
Assigning a temporary variable takes up a small amount of constant space but is still counted as in-place.
Sorting Explanations
Merge Sort: For arrays, it requires significant space, but for Linked Lists, due to pointer manipulations, it does not require extra space.
Selection Sort: In each iteration, find the smallest element and swap it with the first index. Subsequent iterations process n-1 elements and repeat the first step.
Bubble Sort: Compare the 1st and 2nd elements. If the 2nd is less than the 1st, swap them, then look at the 2nd and 3rd, and so on. Repeat this step; if an iteration results in no swaps, stop (this optimization is achieved with a flag).
Quick Sort: Choose a pivot. Partition the array around the pivot: smaller elements go to the left and larger elements to the right. Recursively apply QuickSort on the subarrays.
Radix Sort: Sort numbers by the 1s digit, then by the 10s digit, then by the 100s digit. Continue until the highest digit is processed. Each digit pass preserves the previous order.
Linked List and Array Sorting Preferences
Linked List
- Sequential access: Note the cost of traversing through nodes.
- Most Optimal Sort: Merge Sort. No need for random access of nodes and easy rewiring of pointers.
- 2nd Best Sort: Insertion Sort. Easy to rewire pointers; the only expensive step is traversing through nodes.
Array
- Easy random access.
- Good with Quick Sort. Bad with Insertion Sort because inserting an element requires shifting everything to the right, which is a costly operation.
Hashing
Uniform Hash Function
- m is the size of the hash table.
- Or choose a prime number!
Multiplication Method
Hashing of Strings
- Multiplying by 31 ensures that the position of letters matters.
Load Factor
- Measure how dense the hash table is.
- α (load factor) = n (number of keys) / m (size of hash table).
- If the load factor exceeds the bound, rehash all keys into a bigger table (a prime number close to double the table size m).
- Using a linked list for chains means separate chaining is not cache-friendly.
Linear Probing
Modified Linear Probing
Quadratic Probing
If α < 0.5 and m is prime, you can always find an empty slot.
Double Hashing
- The secondary hash function must not evaluate to 0.
- If secondary hash(k) = 1, it is the same as linear probing.
- If secondary hash(k) = d, where d is a constant integer > 1, it is the same as modified linear probing.
Chaining vs Open-Addressing
Chaining is good when:
Load factor may exceed 1, many insertions occur, or deletions are frequent. You don't mind extra memory and want simpler logic.
Bad if: Cache performance matters or memory usage must be tight.
Open Addressing is good when:
Memory is tight, you want cache-friendly performance, and the load factor is kept low.
Bad if:
The table gets too full or many deletions are required (requires lazy deletion).
Worst Case Performance for Hashing
- Otherwise, the average case is O(1) for both; in separate chaining, ensure the load factor α is bounded.
Multiset and Multimap
Multiset
A multiset is like a set, but it allows duplicates (e.g., {1, 1, 2, 3, 3, 3}). Order does not matter, but duplicates are allowed.
Implementing Multiset
Multimap
A multimap is like a map, but it allows multiple values for the same key (e.g., ("CS2040" → "A"), ("CS2040" → "B")).
Implementing Multimap
- Retaining average O(1) insert, delete, and find operations.
English with a size of 1.17 MB