|
| 1 | +--- |
| 2 | +title: Contains Duplicate |
| 3 | +commit: e69043 |
| 4 | +url: https://github.com/josimar-silva/kaizen/commit/e69043ecd2cc4fd8c17612d74ab5f8309925d775 |
| 5 | +--- |
| 6 | + |
| 7 | +#### Problem |
| 8 | +Given an array of integers, determine if any value appears at least twice. The function should return `true` if a duplicate exists, and `false` if all elements are distinct. |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +### Algorithmic Approaches & Analysis |
| 13 | + |
| 14 | +This problem presents a classic space-time trade-off, offering two primary solutions with different performance characteristics. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +### Approach 1: Hash Set (Time-Optimized) |
| 19 | + |
| 20 | +This approach prioritizes execution speed by using an auxiliary data structure to keep track of the elements seen so far. |
| 21 | + |
| 22 | +#### Big O Analysis |
| 23 | + |
| 24 | +**Time Complexity:** O(N) |
| 25 | +- We iterate through the array of N elements exactly once. |
| 26 | +- For each element, we perform a `contains` check and an `add` operation on a hash set. On average, these are O(1) constant-time operations. |
| 27 | +- ⇒ **O(N)** |
| 28 | + |
| 29 | +**Space Complexity:** O(N) |
| 30 | +- In the worst-case scenario (an array with no duplicates), the hash set will store all N elements. |
| 31 | +- The memory required grows linearly with the size of the input array. |
| 32 | +- ⇒ **O(N)** |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +### Approach 2: Sorting (Space-Optimized) |
| 37 | + |
| 38 | +This approach prioritizes memory efficiency. By sorting the array, any duplicates are forced to be adjacent, making them easy to find. |
| 39 | + |
| 40 | +#### Big O Analysis |
| 41 | + |
| 42 | +**Time Complexity:** O(N log N) |
| 43 | +- The dominant operation is sorting the array, which has an average time complexity of O(N log N) for efficient algorithms like Timsort or Introsort. |
| 44 | +- After sorting, we perform a single pass (an O(N) operation) to check for adjacent duplicates. |
| 45 | +- The overall complexity is determined by the sorting step. |
| 46 | +- ⇒ **O(N log N)** |
| 47 | + |
| 48 | +**Space Complexity:** O(1) |
| 49 | +- If the array is sorted in-place, the additional space required is either constant, O(1), or logarithmic, O(log N), for the sort's recursion stack. |
| 50 | +- This is significantly more memory-efficient than the hash set approach. |
| 51 | +- ⇒ **O(1)** |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +#### Layman’s Terms |
| 56 | + |
| 57 | +Imagine you're a bouncer at a large event, and your job is to prevent anyone from entering twice. |
| 58 | + |
| 59 | +- **Hash Set Method (Fast with a Guest List):** You have a digital guest list (the hash set). When a guest arrives, you type their name in. The system instantly tells you if the name is already on the list. This is extremely fast, but you need the computer system to store the list of every guest who has entered. |
| 60 | + |
| 61 | +- **Sorting Method (Slower with no Tech):** You have no computer. You let all the guests into a waiting room and ask them to form a single line, ordered alphabetically by name. Now, to find duplicates, you just walk down the line and see if any two people standing next to each other have the same name. The initial organization takes a while, but you didn't need any extra equipment. |
| 62 | + |
| 63 | +👉 **Rule of thumb:** The Hash Set method is faster because you trade memory for instant lookups. The Sorting method is more memory-efficient because you spend extra time upfront to organize the data, which makes finding duplicates trivial. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +#### Conclusion |
| 68 | + |
| 69 | +- **Efficiency:** The "best" solution depends entirely on the operational context. For most typical application scenarios, the O(N) time complexity of the Hash Set method is preferred if the memory cost is acceptable. |
| 70 | + |
| 71 | +- **The Engineering Decision:** This problem is a perfect illustration of a **space-time trade-off**. |
| 72 | + - Choose the **Hash Set** method when speed is the primary concern and memory is not a significant constraint. |
| 73 | + - Choose the **Sorting** method when memory is highly constrained (e.g., in embedded systems or processing massive datasets) and a slightly longer execution time is permissible. |
| 74 | + |
| 75 | +- **Lesson Learned:** There is rarely a single "best" algorithm. The optimal choice is the one that best fits the constraints of the system. Recognizing and analyzing space-time trade-offs is a fundamental skill in software engineering. |
0 commit comments