feat(datastructures, hashmap): design a hashmap#182
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a new "Jewels and Stones" algorithm (two implementations, README, tests), refactors HashMap into modular components (Item, Bucket, HashMap) with updated exports and tests, enhances HashMap README, updates DIRECTORY entries, and adds a minor type annotation for Caesar cipher. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~23 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (4)
algorithms/strings/caeser_cipher/__init__.py (1)
85-87: Consider adding return type annotation for consistency.The
plaintext: strparameter annotation is a good addition. For completeness, consider also adding the return type annotation-> str.- def encode(self, plaintext: str): + def encode(self, plaintext: str) -> str:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@algorithms/strings/caeser_cipher/__init__.py` around lines 85 - 87, Add an explicit return type annotation to the encode method for consistency: change the signature of encode (the function named encode) from def encode(self, plaintext: str): to def encode(self, plaintext: str) -> str: so callers and type checkers know it returns a string; no other logic changes needed.datastructures/hashmap/item.py (1)
4-17: LGTM!Simple and clean data container class. The implementation is appropriate for its purpose.
Consider adding
__repr__for easier debugging:def __repr__(self) -> str: return f"Item(key={self.key!r}, data={self.data!r})"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datastructures/hashmap/item.py` around lines 4 - 17, Add a readable representation for debugging by implementing __repr__ on the Item class: define a __repr__ method on class Item that returns a string showing the key and data (e.g., include both self.key and self.data) so instances print clearly during debugging and logging.datastructures/hashmap/test_hashmap.py (1)
54-58: Consider adding test for removing a non-existent key.The test verifies that
getraisesKeyErrorfor non-existent keys, but there's no test verifying thatremoveraisesKeyErrorwhen attempting to remove a non-existent key (as documented in theHashMap.removedocstring).Note: This requires fixing
Bucket.removeto actually raiseKeyErroras flagged in the bucket.py review.# Test removing non-existent key raises KeyError with self.assertRaises(KeyError): hashmap.remove(999)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datastructures/hashmap/test_hashmap.py` around lines 54 - 58, Add a test that asserts HashMap.remove raises KeyError for a non-existent key and update the underlying Bucket.remove implementation to raise KeyError when the key is not found; specifically, in the test file add a case that calls hashmap.remove(999) (or another absent key) inside assertRaises(KeyError), and in Bucket.remove make sure it does not silently return when the item is missing but instead raises KeyError so HashMap.remove propagates the exception.algorithms/hash_table/jewels_and_stones/__init__.py (1)
21-34: Comments are misleading and appear copy-pasted from the other function.The implementation logic is correct, but the comments don't accurately describe what this function does:
- Line 22: "Store all jewel types for fast membership checking" — but
stone_countsstores stone character frequencies, not jewel types.- Line 28: "Check each stone and increment count" — but we're iterating over
jewels, notstones.📝 Proposed fix for comments
def num_jewels_in_stones_with_dict(jewels: str, stones: str) -> int: - # Store all jewel types for fast membership checking + # Count occurrences of each stone character stone_counts: Counter[str] = Counter(stones) # Count how many stones are jewels count = 0 - # Check each stone and increment count if it's a jewel + # For each jewel type, add its count from stones for jewel in jewels: if jewel in stone_counts: count += stone_counts[jewel] # Return the total number of jewels found in stones return count🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@algorithms/hash_table/jewels_and_stones/__init__.py` around lines 21 - 34, Update the misleading comments inside num_jewels_in_stones_with_dict: replace the comment above stone_counts so it correctly states that stone_counts is a frequency map of stones (not "jewel types"), change the loop comment to indicate we're iterating over each jewel and summing matching stone frequencies, and ensure the final comment notes returning the total matched jewels; reference symbols: num_jewels_in_stones_with_dict, stone_counts, jewels, stones, count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@algorithms/hash_table/jewels_and_stones/__init__.py`:
- Around line 1-2: Remove the incorrect duplicate import of Counter from typing
and keep the runtime Counter from collections; update the import line so only
the needed typing symbols are imported (e.g., remove "Counter" from "from typing
import Set, Counter") or switch to builtin generics (e.g., use "set[str]" and
"Counter[str]" in annotations) while keeping "from collections import Counter"
for runtime use; ensure references in the module (e.g., any type annotations or
uses of Counter) use the collections.Counter at runtime and appropriate typing
generics for annotations.
In `@datastructures/hashmap/bucket.py`:
- Around line 37-46: Bucket.remove currently iterates self.bucket and silently
returns if the key isn't found and also binds an unused variable current_value;
update the remove method (Bucket.remove) to raise KeyError when no matching
item.key is found (e.g., use a for/else or track a found flag and after loop
raise KeyError(f"Key not found: {key}")), and remove the unused current_value
binding (only use item.key to compare). Ensure the raised exception matches the
HashMap.remove docstring.
- Around line 23-31: Fix the minor typos and the unused variable in the for-loop
inside the bucket handling: change the comment "key-valur" to "key-value" and
"blag" to "flag", and replace the unused assignment "current_key, current_value
= item.key, item.data" with "current_key, _ = item.key, item.data" (or simply
assign only current_key) so the unused value is intentional; keep the existing
logic that updates self.bucket[idx] = Item(key, value) and sets found = True
before breaking.
In `@datastructures/hashmap/hash_map.py`:
- Around line 30-38: Docstring for HashMap.set uses "it's" instead of "its" and
the public API name is inconsistent with README which references put; update the
docstring in set to use the possessive "its" and then either rename the method
set to put (and update all references) or add a thin alias method put that calls
set to keep backward/README compatibility (ensure to update any tests/imports
that expect put). Target the HashMap.set method and the README references when
making the change.
In `@datastructures/hashmap/README.md`:
- Around line 5-12: The README's API and the implementation disagree: README
documents put(int key, int value) but the HashMap class in hash_map.py exposes
set(key, value); make them consistent by either renaming the method in the
HashMap class from set to put (update the method name and any internal
references/calls to set) or by updating README to document set(key, value)
instead of put; reference the HashMap class and the set method in your changes
so callers/tests use the same name.
In `@datastructures/hashmap/test_hashmap.py`:
- Around line 26-33: The test comment is wrong because lines using hashmap.set
with the same key (2) exercise update, not collision; change the test in
test_hashmap.py to insert two different keys that collide under the HashMap's
key_space (e.g., with key_space=3 use keys 1 and 4), call hashmap.set(1, ...)
and hashmap.set(4, ...), then assert both hashmap.get(1) and hashmap.get(4)
return their respective values to verify collision handling in the HashMap
implementation (refer to the hashmap.set and hashmap.get calls).
In `@DIRECTORY.md`:
- Around line 373-377: Fix the typographical error by renaming "Caeser Cipher"
to "Caesar Cipher" in the DIRECTORY.md entry and update the linked label "Test
Caeser" to "Test Caesar"; also rename any matching directory and test file names
(e.g., the directory containing caeser_cipher and the test file test_caeser.py)
to use "caesar_cipher" and "test_caesar.py" and update their import/links
accordingly to avoid broken references.
---
Nitpick comments:
In `@algorithms/hash_table/jewels_and_stones/__init__.py`:
- Around line 21-34: Update the misleading comments inside
num_jewels_in_stones_with_dict: replace the comment above stone_counts so it
correctly states that stone_counts is a frequency map of stones (not "jewel
types"), change the loop comment to indicate we're iterating over each jewel and
summing matching stone frequencies, and ensure the final comment notes returning
the total matched jewels; reference symbols: num_jewels_in_stones_with_dict,
stone_counts, jewels, stones, count.
In `@algorithms/strings/caeser_cipher/__init__.py`:
- Around line 85-87: Add an explicit return type annotation to the encode method
for consistency: change the signature of encode (the function named encode) from
def encode(self, plaintext: str): to def encode(self, plaintext: str) -> str: so
callers and type checkers know it returns a string; no other logic changes
needed.
In `@datastructures/hashmap/item.py`:
- Around line 4-17: Add a readable representation for debugging by implementing
__repr__ on the Item class: define a __repr__ method on class Item that returns
a string showing the key and data (e.g., include both self.key and self.data) so
instances print clearly during debugging and logging.
In `@datastructures/hashmap/test_hashmap.py`:
- Around line 54-58: Add a test that asserts HashMap.remove raises KeyError for
a non-existent key and update the underlying Bucket.remove implementation to
raise KeyError when the key is not found; specifically, in the test file add a
case that calls hashmap.remove(999) (or another absent key) inside
assertRaises(KeyError), and in Bucket.remove make sure it does not silently
return when the item is missing but instead raises KeyError so HashMap.remove
propagates the exception.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (13)
datastructures/hashmap/images/solutions/hash_map_design_key_collision.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_key_collision_with_buckets.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_1.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_10.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_11.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_2.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_3.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_4.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_5.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_6.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_7.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_8.pngis excluded by!**/*.pngdatastructures/hashmap/images/solutions/hash_map_design_solution_9.pngis excluded by!**/*.png
📒 Files selected for processing (16)
DIRECTORY.mdalgorithms/hash_table/jewels_and_stones/README.mdalgorithms/hash_table/jewels_and_stones/__init__.pyalgorithms/hash_table/jewels_and_stones/test_jewels_and_stones.pyalgorithms/strings/caeser_cipher/README.mdalgorithms/strings/caeser_cipher/__init__.pyalgorithms/strings/caeser_cipher/test_caeser.pyalgorithms/strings/run_length_encoding/README.mdalgorithms/strings/run_length_encoding/__init__.pyalgorithms/strings/run_length_encoding/test_run_length.pydatastructures/hashmap/README.mddatastructures/hashmap/__init__.pydatastructures/hashmap/bucket.pydatastructures/hashmap/hash_map.pydatastructures/hashmap/item.pydatastructures/hashmap/test_hashmap.py
| for idx, item in enumerate(self.bucket): | ||
| current_key, current_value = item.key, item.data | ||
| # If the key matches the key of the current key-valur pair | ||
| if current_key == key: | ||
| # Update the value of the key-value pair | ||
| self.bucket[idx] = Item(key, value) | ||
| # Set the blag to true, indicating the key is found and break out of the loop | ||
| found = True | ||
| break |
There was a problem hiding this comment.
Minor typos in comments.
- Line 25: "key-valur" should be "key-value"
- Line 29: "blag" should be "flag"
Also, current_value is assigned but never used. Consider using _ to indicate an intentionally unused variable.
✏️ Suggested fix
for idx, item in enumerate(self.bucket):
- current_key, current_value = item.key, item.data
- # If the key matches the key of the current key-valur pair
+ current_key = item.key
+ # If the key matches the key of the current key-value pair
if current_key == key:
# Update the value of the key-value pair
self.bucket[idx] = Item(key, value)
- # Set the blag to true, indicating the key is found and break out of the loop
+ # Set the flag to true, indicating the key is found and break out of the loop
found = True
break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@datastructures/hashmap/bucket.py` around lines 23 - 31, Fix the minor typos
and the unused variable in the for-loop inside the bucket handling: change the
comment "key-valur" to "key-value" and "blag" to "flag", and replace the unused
assignment "current_key, current_value = item.key, item.data" with "current_key,
_ = item.key, item.data" (or simply assign only current_key) so the unused value
is intentional; keep the existing logic that updates self.bucket[idx] =
Item(key, value) and sets found = True before breaking.
| def remove(self, key: Any) -> None: | ||
| # Iterate through each key-value pair in the bucket | ||
| for idx, item in enumerate(self.bucket): | ||
| current_key, current_value = item.key, item.data | ||
| # If the key matches the key of the current key-value pair | ||
| if current_key == key: | ||
| # Delete the key-value pair from the bucket | ||
| del self.bucket[idx] | ||
| # Exit the loop as the key has been removed | ||
| break |
There was a problem hiding this comment.
remove does not raise KeyError when key is not found.
The HashMap.remove docstring states it "Raises a Key error if the key does not exist", but this Bucket.remove implementation silently does nothing if the key isn't found. This inconsistency could lead to unexpected behaviour.
Additionally, current_value on line 40 is unused.
🐛 Proposed fix to raise KeyError when key not found
def remove(self, key: Any) -> None:
# Iterate through each key-value pair in the bucket
for idx, item in enumerate(self.bucket):
- current_key, current_value = item.key, item.data
+ current_key = item.key
# If the key matches the key of the current key-value pair
if current_key == key:
# Delete the key-value pair from the bucket
del self.bucket[idx]
- # Exit the loop as the key has been removed
- break
+ return
+ raise KeyError(f"Key {key} does not exist")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@datastructures/hashmap/bucket.py` around lines 37 - 46, Bucket.remove
currently iterates self.bucket and silently returns if the key isn't found and
also binds an unused variable current_value; update the remove method
(Bucket.remove) to raise KeyError when no matching item.key is found (e.g., use
a for/else or track a found flag and after loop raise KeyError(f"Key not found:
{key}")), and remove the unused current_value binding (only use item.key to
compare). Ensure the raised exception matches the HashMap.remove docstring.
| # can handle collisions | ||
|
|
||
| item_5 = 2, "something" | ||
| hashmap.set(item_5[0], item_5[1]) | ||
| item_5_key = 2 | ||
| item_5_value = "something" | ||
| hashmap.set(item_5_key, item_5_value) | ||
|
|
||
| expected_item_5 = Item(item_5[0], item_5[1]) | ||
| actual_5 = hashmap.table[2][0] | ||
| self.assertEqual(actual_5.key, expected_item_5.key) | ||
| self.assertEqual(actual_5.data, expected_item_5.data) | ||
| actual_5 = hashmap.get(item_5_key) | ||
| self.assertEqual(actual_5, item_5_value) |
There was a problem hiding this comment.
The "collision" comment is misleading - this tests update, not collision.
Lines 28-30 set the same key (2) twice, which tests the update functionality, not collision handling. A true hash collision occurs when different keys map to the same bucket.
With key_space=3, keys 1 and 4 both hash to bucket 1 (since 1 % 3 == 1 and 4 % 3 == 1). Consider testing actual collisions:
# Test actual collision: keys 1 and 4 both hash to bucket 1
hashmap.set(1, "value_for_1")
hashmap.set(4, "value_for_4") # Collides with key 1
self.assertEqual(hashmap.get(1), "value_for_1")
self.assertEqual(hashmap.get(4), "value_for_4")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@datastructures/hashmap/test_hashmap.py` around lines 26 - 33, The test
comment is wrong because lines using hashmap.set with the same key (2) exercise
update, not collision; change the test in test_hashmap.py to insert two
different keys that collide under the HashMap's key_space (e.g., with
key_space=3 use keys 1 and 4), call hashmap.set(1, ...) and hashmap.set(4, ...),
then assert both hashmap.get(1) and hashmap.get(4) return their respective
values to verify collision handling in the HashMap implementation (refer to the
hashmap.set and hashmap.get calls).
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Describe your change:
HashMap datastructure using buckets with configurable key space
Checklist:
Fixes: #{$ISSUE_NO}.Summary by CodeRabbit
New Features
Documentation
Tests