fix: SQL cache upload fails for Decimal values exceeding int64 range#70
fix: SQL cache upload fails for Decimal values exceeding int64 range#70
Conversation
📝 WalkthroughWalkthroughParquet serialization fallback in the SQL caching module now also catches OverflowError and resets the temp file before falling back to pickle. A new helper Sequence Diagram(s)sequenceDiagram
participant Caller
participant SQL_Caching
participant TempFile
participant Uploader
Caller->>SQL_Caching: upload_sql_cache(dataframe)
SQL_Caching->>TempFile: create temp file
SQL_Caching->>TempFile: try to_parquet(write)
alt parquet write succeeds
TempFile-->>SQL_Caching: parquet bytes
SQL_Caching->>Uploader: requests.put(upload_url, parquet bytes)
Uploader-->>SQL_Caching: 200 OK
SQL_Caching-->>Caller: success
else parquet write raises ArrowNotImplementedError/ArrowInvalid/OverflowError
TempFile-->>SQL_Caching: error
SQL_Caching->>TempFile: reset/truncate temp file
SQL_Caching->>TempFile: to_pickle(write)
TempFile-->>SQL_Caching: pickle bytes
SQL_Caching->>Uploader: requests.put(upload_url, pickle bytes)
Uploader-->>SQL_Caching: 200 OK
SQL_Caching-->>Caller: success (pickle fallback)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
📦 Python package built successfully!
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #70 +/- ##
==========================================
+ Coverage 73.50% 73.71% +0.21%
==========================================
Files 93 93
Lines 5253 5261 +8
Branches 764 764
==========================================
+ Hits 3861 3878 +17
+ Misses 1147 1138 -9
Partials 245 245
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_caching.py`:
- Around line 101-105: The exception handler that falls back to pickle (catching
ArrowNotImplementedError, ArrowInvalid, OverflowError) must clear/truncate the
temporary file before calling dataframe.to_pickle(temp_file) to avoid appending
pickle bytes to any partially-written parquet data; locate the except block
handling ArrowNotImplementedError/ArrowInvalid/OverflowError and, before
invoking dataframe.to_pickle(temp_file), open or recreate temp_file in
write-binary/truncate mode (or unlink and recreate) so the file is empty, then
call dataframe.to_pickle(temp_file).
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 690-695: The helper _is_large_number lacks an explicit parameter
type; update its signature to annotate the input (e.g., change def
_is_large_number(x) -> bool: to def _is_large_number(x: Any) -> bool:) or use a
narrower union like Union[int, float, Decimal] to satisfy typing guidelines, and
import Any/Union from typing if needed; ensure only the parameter type is added
and existing logic/return annotation remains unchanged.
In `@tests/unit/test_sql_execution_internal.py`:
- Around line 203-260: Add explicit return type annotations (-> None) to the
test functions test_sanitize_dataframe_for_parquet_decimal_large_numbers,
test_sanitize_dataframe_for_parquet_decimal_small_numbers,
test_sanitize_dataframe_for_parquet_decimal_nan, and test_is_large_number, and
add a short descriptive docstring to test_is_large_number explaining what it
asserts (e.g., checks various numeric and non-numeric inputs for
_is_large_number behavior) to comply with the typing/docstring rules.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1a1d655a-ca11-41f8-b0b3-bb4746425e42
📒 Files selected for processing (3)
deepnote_toolkit/sql/sql_caching.pydeepnote_toolkit/sql/sql_execution.pytests/unit/test_sql_execution_internal.py
|
🚀 Review App Deployment Started
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/test_sql_caching.py`:
- Around line 343-352: The test test_upload_parquet_success currently only
checks the URL but not that HTTP error handling was invoked; update the test to
assert that the response's raise_for_status() was called after calling
upload_sql_cache. Specifically, using the existing mock_put (and its return
value), add an assertion like
mock_put.return_value.raise_for_status.assert_called_once() to verify
upload_sql_cache invoked raise_for_status on the HTTP response.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5bdf616b-15a9-4245-a061-15fb4bdfae0b
📒 Files selected for processing (2)
tests/unit/test_sql_caching.pytests/unit/test_sql_execution_internal.py
Summary by CodeRabbit
Bug Fixes
Tests