Skip to content

GH-49441: [C++][Gandiva] Add rand_integer function#49442

Open
dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
dmitry-chirkov-dremio:main
Open

GH-49441: [C++][Gandiva] Add rand_integer function#49442
dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
dmitry-chirkov-dremio:main

Conversation

@dmitry-chirkov-dremio
Copy link

@dmitry-chirkov-dremio dmitry-chirkov-dremio commented Mar 3, 2026

Rationale for this change

Add rand_integer function to Gandiva to generate random integers, complementing the existing rand/random functions that generate random doubles. This provides native integer random number generation and offers a more efficient alternative to CAST(rand() * range AS INT).

What changes are included in this PR?

  • Add RandomIntegerGeneratorHolder class following the existing RandomGeneratorHolder pattern
  • Implement three function signatures:
    • rand_integer() → int32 in range [INT32_MIN, INT32_MAX]
    • rand_integer(int32 range) → int32 in range [0, range-1]
    • rand_integer(int32 min, int32 max) → int32 in range [min, max] inclusive
  • Add parameter validation (range > 0, min <= max) at expression compilation time
  • Add 8 unit tests covering all signatures and edge cases
  • Use std::uniform_int_distribution<int32_t> with Mersenne Twister engine

Are these changes tested?

Yes, added 8 unit tests in random_generator_holder_test.cc:

  • NoParams - verifies full int32 range
  • WithRange - verifies [0, range-1] bounds
  • WithMinMax - verifies [min, max] inclusive bounds
  • WithNegativeMinMax - verifies negative range handling
  • InvalidRangeZero - verifies range=0 is rejected
  • InvalidRangeNegative - verifies negative range is rejected
  • InvalidMinGreaterThanMax - verifies min > max is rejected
  • NullRangeDefaultsToOne - verifies null parameter handling

Are there any user-facing changes?

Yes, this adds a new rand_integer function to Gandiva with three signatures as described above.

Add rand_integer function to Gandiva with three signatures:
- rand_integer() - returns int32 in full range [INT32_MIN, INT32_MAX]
- rand_integer(int32 range) - returns int32 in range [0, range-1]
- rand_integer(int32 min, int32 max) - returns int32 in range [min, max] inclusive

Implementation follows the existing RandomGeneratorHolder pattern using
std::uniform_int_distribution<int32_t> and validates parameters at
expression compilation time.
Copy link
Contributor

@StevenMPhillips StevenMPhillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution - the overall wiring is clean and the tests cover many happy/invalid paths. I am requesting changes for a semantic/API mismatch that should be resolved before merge.

Main blocker:

  • The exposed signatures in function_registry_math_ops.cc suggest runtime arguments are accepted, but implementation enforces literals and the runtime stub params are ignored:
    • Literal-only enforcement: random_generator_holder.cc (literal == nullptr checks for 1-arg and 2-arg variants)
    • Runtime args ignored: gdv_function_stubs.cc (gdv_fn_rand_integer_with_range and gdv_fn_rand_integer_with_min_max ignore provided args and call (*holder)())

Please pick one explicit direction:

  1. Keep literal-only semantics: document this clearly and add tests asserting non-literal arguments fail with clear errors.
  2. Support runtime range/min/max args: consume runtime values in stubs/holder path and revisit nullability/null semantics accordingly.

Additional issues to address:

  • Null semantics are currently implicit/inconsistent:
    • rand_integer(NULL) defaults to range=1 -> always 0 (covered by test)
    • For 2-arg form, NULLs default to 0/0 but this is undocumented and untested.
  • Add end-to-end expression/JIT tests (not just holder-only tests) to validate registry + stub mapping path for all 3 signatures.

Once semantics are clarified and covered, this should be in good shape.

- Mark unused literal parameters in stubs with `/*param*/` syntax (matches `regexp_replace`, `regexp_extract` convention)
- Add non-literal argument rejection tests for `rand` and `rand_integer`
- Add end-to-end projector tests for `rand` (2 tests) and `rand_integer` (3 tests)
- Update null semantics: NULL range/max defaults to INT32_MAX instead of 0/1 (more useful behavior)
- Use `EXPECT_RAISES_WITH_MESSAGE_THAT` pattern for error validation (matches other Gandiva tests)
@dmitry-chirkov-dremio
Copy link
Author

Once semantics are clarified and covered, this should be in good shape.
@StevenMPhillips pushed a new commit.

I don't want to go overboard with signature rework wrt literals - we seem to have a bit of inconsistency between regexp_replace/regexp_extract, like/ilike, to_date and rand functions. I adjusted implementation to match regexp_replace with commented out argument names.

@dmitry-chirkov-dremio
Copy link
Author

Pipeline failed due to ongoing GitHub issues. How could PR owner retrigger it?

@kou
Copy link
Member

kou commented Mar 4, 2026

Only Apache Arrow committers can retrigger jobs.

@kou
Copy link
Member

kou commented Mar 4, 2026

@lriggs @akravchukdremio @xxlaykxx You may want to review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants