Skip to content

Fix fulltext stripping#826

Open
abnegate wants to merge 4 commits intomainfrom
fix-fulltext
Open

Fix fulltext stripping#826
abnegate wants to merge 4 commits intomainfrom
fix-fulltext

Conversation

@abnegate
Copy link
Member

@abnegate abnegate commented Mar 5, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Prevents empty full-text searches from running and returning unintended results.
    • Avoids executing full-text MATCH/tsquery when the search value is empty.
    • Improved sanitization and normalization of search input, including better handling of accented and special characters.
  • Tests

    • Added regression tests covering accented characters and various special-character inputs for full-text search.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Warning

Rate limit exceeded

@abnegate has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 768e4e12-5eab-440b-b6f8-26a90235e46f

📥 Commits

Reviewing files that changed from the base of the PR and between 438f97c and 72c48ad.

📒 Files selected for processing (2)
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
📝 Walkthrough

Walkthrough

This PR prevents empty full-text search binds and improves Unicode-aware sanitization across DB adapters. MariaDB and Postgres now short-circuit empty fulltext queries with trivial SQL fragments, SQL.getFulltextValue gains Unicode filtering, and regression tests for accented/special characters were added (duplicate test present).

Changes

Cohort / File(s) Summary
Full-Text Search Short-Circuit Logic
src/Database/Adapter/MariaDB.php, src/Database/Adapter/Postgres.php
Compute fulltext value once for TYPE_SEARCH/TYPE_NOT_SEARCH; if empty return trivial SQL (0 = 1 for search, 1 = 1 for not-search) instead of binding empty values; otherwise bind and use existing MATCH/to_tsvector logic.
Character Sanitization Enhancement
src/Database/Adapter/SQL.php, src/Database/Adapter/Postgres.php
getFulltextValue updated to use a Unicode-aware pattern that keeps letters, numbers, underscores, and whitespace; collapses extra spaces and preserves quoted exact-match behavior.
Regression Test Coverage (duplicated)
tests/e2e/Adapter/Scopes/DocumentTests.php
Adds testFindFulltextAccentedAndSpecialChars to validate searches with accented names and non-operator special characters; note: the test appears duplicated (two identical methods added).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • fogelito

Poem

🐰
I nibble bytes and sniff the text,
Accents safe, no queries vexed,
Empty binds I hop away,
Special chars can't ruin play,
Hooray — fulltext searches saved today! 🎩

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix fulltext stripping' directly relates to the main change: implementing sanitization/boundary guards in fulltext search handling to properly strip and validate fulltext values across multiple database adapters.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-fulltext

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/Postgres.php`:
- Around line 1925-1927: The two-step sanitization on $value uses
preg_replace(..., '/u') which can return null for malformed UTF-8 and then
passes that null into the next preg_replace, causing a TypeError; make the
sequence null-safe by checking the result of the first preg_replace (or
coalescing it to an empty string) before calling the second preg_replace/trim so
preg_replace and trim always receive a string; locate the transformations on
$value in Postgres.php (the preg_replace calls on $value) and either guard the
second call with an is_string() check or use a null-coalescing cast to ensure a
string is passed onward.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47a42995-85a0-4e02-8b90-af200c0840b6

📥 Commits

Reviewing files that changed from the base of the PR and between 8227f57 and f3bdc34.

📒 Files selected for processing (4)
  • src/Database/Adapter/MariaDB.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 1755-1758: The preg_replace call that assigns to $value can return
null on invalid UTF-8 and later calls (another preg_replace and trim) will
error; after the first preg_replace('/[^\p{L}\p{N}_\s]/u', ...) ensure the
result is not null by validating/normalizing UTF-8 and providing a safe
fallback: e.g., detect null and set $value = '' or run
mb_convert_encoding($value, 'UTF-8', 'UTF-8') before the regex, then re-run or
cast the preg_replace result to a string; update the code paths around the
$value variable so subsequent preg_replace('/\s+/', ' ', $value) and
trim($value) always receive a string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6fe1d9e4-4f1d-4b35-ade9-ae1fcfb74bda

📥 Commits

Reviewing files that changed from the base of the PR and between f3bdc34 and 438f97c.

📒 Files selected for processing (2)
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant