Skip to content

fix: SQL cache upload fails for Decimal values exceeding int64 range#70

Merged
m1so merged 4 commits intomainfrom
mb/fix-sql-cache-upload-serialization
Mar 4, 2026
Merged

fix: SQL cache upload fails for Decimal values exceeding int64 range#70
m1so merged 4 commits intomainfrom
mb/fix-sql-cache-upload-serialization

Conversation

@m1so
Copy link
Contributor

@m1so m1so commented Mar 4, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved serialization error handling to also catch overflow cases, ensure temp-file cleanup on failure, and reliably fall back to a safer serialized format; large numeric values are detected and preserved as strings to avoid write failures.
  • Tests

    • Added tests for large/small decimal handling, NaN behavior, large-number detection, fallback-to-pickle behavior, and partial-write cleanup verification.

@m1so m1so requested review from mfranczel and tkislan March 4, 2026 15:00
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

Parquet serialization fallback in the SQL caching module now also catches OverflowError and resets the temp file before falling back to pickle. A new helper _is_large_number(x) -> bool was added and used by _sanitize_dataframe_for_parquet to centralize detection of values outside the int64 range; large numerics are converted to strings as before. Tests were added covering _is_large_number and decimal handling in parquet sanitization, including large/small decimals and Decimal('NaN') behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant SQL_Caching
    participant TempFile
    participant Uploader
    Caller->>SQL_Caching: upload_sql_cache(dataframe)
    SQL_Caching->>TempFile: create temp file
    SQL_Caching->>TempFile: try to_parquet(write)
    alt parquet write succeeds
        TempFile-->>SQL_Caching: parquet bytes
        SQL_Caching->>Uploader: requests.put(upload_url, parquet bytes)
        Uploader-->>SQL_Caching: 200 OK
        SQL_Caching-->>Caller: success
    else parquet write raises ArrowNotImplementedError/ArrowInvalid/OverflowError
        TempFile-->>SQL_Caching: error
        SQL_Caching->>TempFile: reset/truncate temp file
        SQL_Caching->>TempFile: to_pickle(write)
        TempFile-->>SQL_Caching: pickle bytes
        SQL_Caching->>Uploader: requests.put(upload_url, pickle bytes)
        Uploader-->>SQL_Caching: 200 OK
        SQL_Caching-->>Caller: success (pickle fallback)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning PR modifies SQL cache serialization for Decimal values but lacks documentation updates in primary repositories (deepnote/deepnote OSS and deepnote-internal). Update deepnote/deepnote OSS documentation and deepnote-internal roadmap with changelog entry describing the Decimal handling fix and its user impact.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: handling Decimal values exceeding int64 range during SQL cache uploads. It's specific, concise, and directly reflects the core changes across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

📦 Python package built successfully!

  • Version: 2.1.2.dev11+d31e7fa
  • Wheel: deepnote_toolkit-2.1.2.dev11+d31e7fa-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.1.2.dev11%2Bd31e7fa/deepnote_toolkit-2.1.2.dev11%2Bd31e7fa-py3-none-any.whl"

@codecov
Copy link

codecov bot commented Mar 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.71%. Comparing base (10f8913) to head (3bc3705).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   73.50%   73.71%   +0.21%     
==========================================
  Files          93       93              
  Lines        5253     5261       +8     
  Branches      764      764              
==========================================
+ Hits         3861     3878      +17     
+ Misses       1147     1138       -9     
  Partials      245      245              
Flag Coverage Δ
combined 73.71% <100.00%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_caching.py`:
- Around line 101-105: The exception handler that falls back to pickle (catching
ArrowNotImplementedError, ArrowInvalid, OverflowError) must clear/truncate the
temporary file before calling dataframe.to_pickle(temp_file) to avoid appending
pickle bytes to any partially-written parquet data; locate the except block
handling ArrowNotImplementedError/ArrowInvalid/OverflowError and, before
invoking dataframe.to_pickle(temp_file), open or recreate temp_file in
write-binary/truncate mode (or unlink and recreate) so the file is empty, then
call dataframe.to_pickle(temp_file).

In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 690-695: The helper _is_large_number lacks an explicit parameter
type; update its signature to annotate the input (e.g., change def
_is_large_number(x) -> bool: to def _is_large_number(x: Any) -> bool:) or use a
narrower union like Union[int, float, Decimal] to satisfy typing guidelines, and
import Any/Union from typing if needed; ensure only the parameter type is added
and existing logic/return annotation remains unchanged.

In `@tests/unit/test_sql_execution_internal.py`:
- Around line 203-260: Add explicit return type annotations (-> None) to the
test functions test_sanitize_dataframe_for_parquet_decimal_large_numbers,
test_sanitize_dataframe_for_parquet_decimal_small_numbers,
test_sanitize_dataframe_for_parquet_decimal_nan, and test_is_large_number, and
add a short descriptive docstring to test_is_large_number explaining what it
asserts (e.g., checks various numeric and non-numeric inputs for
_is_large_number behavior) to comply with the typing/docstring rules.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a1d655a-ca11-41f8-b0b3-bb4746425e42

📥 Commits

Reviewing files that changed from the base of the PR and between f0c341f and 03bbc16.

📒 Files selected for processing (3)
  • deepnote_toolkit/sql/sql_caching.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution_internal.py

@deepnote-bot
Copy link

deepnote-bot commented Mar 4, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-70
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-03-04 17:53:02 (UTC)
📜 Deployed commit f900bee78be4cbb74d519dfb2dba10ef4fab1d32
🛠️ Toolkit version d31e7fa

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_sql_caching.py`:
- Around line 343-352: The test test_upload_parquet_success currently only
checks the URL but not that HTTP error handling was invoked; update the test to
assert that the response's raise_for_status() was called after calling
upload_sql_cache. Specifically, using the existing mock_put (and its return
value), add an assertion like
mock_put.return_value.raise_for_status.assert_called_once() to verify
upload_sql_cache invoked raise_for_status on the HTTP response.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5bdf616b-15a9-4245-a061-15fb4bdfae0b

📥 Commits

Reviewing files that changed from the base of the PR and between 97ac60c and 0cb9f86.

📒 Files selected for processing (2)
  • tests/unit/test_sql_caching.py
  • tests/unit/test_sql_execution_internal.py

@m1so m1so marked this pull request as ready for review March 4, 2026 16:07
@m1so m1so requested a review from a team as a code owner March 4, 2026 16:07
@m1so m1so merged commit 9fbeb25 into main Mar 4, 2026
32 checks passed
@m1so m1so deleted the mb/fix-sql-cache-upload-serialization branch March 4, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants