Skip to content

memrecycle: actually match assigned string to factor levels in UTF-8#7657

Merged
aitap merged 4 commits intomasterfrom
fix7648
Mar 3, 2026
Merged

memrecycle: actually match assigned string to factor levels in UTF-8#7657
aitap merged 4 commits intomasterfrom
fix7648

Conversation

@aitap
Copy link
Member

@aitap aitap commented Mar 3, 2026

Previously, after all UTF-8 conversions and marks being set (first in TRUELENGTH, later in the hash), the contents of the original, non-converted source were looked up. This resulted in a level of 0 being assigned for source strings that failed to match, creating an invalid factor.

Fixes: #7648

aitap added 3 commits March 3, 2026 16:31
When UTF-8 conversion in performed during assignment to a factor, make
sure that the source strings are also looked up in their UTF-8 form, not
the original vector.

Fixes: #7648
@aitap aitap added this to the 1.18.4 milestone Mar 3, 2026
@aitap aitap changed the title memrecycle: _actually_ match assigned string to factor levels in UTF-8 memrecycle: actually match assigned string to factor levels in UTF-8 Mar 3, 2026
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.02%. Comparing base (7698fe0) to head (e1c2a28).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7657      +/-   ##
==========================================
- Coverage   99.02%   99.02%   -0.01%     
==========================================
  Files          87       87              
  Lines       16897    16896       -1     
==========================================
- Hits        16733    16732       -1     
  Misses        164      164              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

  • HEAD=fix7648 slower P<0.001 for memrecycle regression fixed in #5463
  • HEAD=fix7648 slower P<0.001 for setDT improved in #5427
    Comparison Plot

Generated via commit e1c2a28

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 57 seconds
Installing different package versions 46 seconds
Running and plotting the test cases 4 minutes and 40 seconds

Copy link
Member

@ben-schwen ben-schwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, TY!

Co-Authored-By: Michael Chirico <chiricom@google.com>
@aitap aitap merged commit 6c6615c into master Mar 3, 2026
13 checks passed
@aitap aitap deleted the fix7648 branch March 3, 2026 20:06
aitap added a commit that referenced this pull request Mar 3, 2026
#7657)

* Test case

* memrecycle: look up source from converted vector

When UTF-8 conversion in performed during assignment to a factor, make
sure that the source strings are also looked up in their UTF-8 form, not
the original vector.

Fixes: #7648

* NEWS entry

* Clarify comment, drop a temporary value

Co-authored-by: Michael Chirico <chiricom@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assignment operator := does not accept anymore text values with non-ASCII with attribute Encoding = unknown

3 participants