GH-49441: [C++][Gandiva] Add rand_integer function#49442
GH-49441: [C++][Gandiva] Add rand_integer function#49442dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
Conversation
Add rand_integer function to Gandiva with three signatures: - rand_integer() - returns int32 in full range [INT32_MIN, INT32_MAX] - rand_integer(int32 range) - returns int32 in range [0, range-1] - rand_integer(int32 min, int32 max) - returns int32 in range [min, max] inclusive Implementation follows the existing RandomGeneratorHolder pattern using std::uniform_int_distribution<int32_t> and validates parameters at expression compilation time.
StevenMPhillips
left a comment
There was a problem hiding this comment.
Thanks for the contribution - the overall wiring is clean and the tests cover many happy/invalid paths. I am requesting changes for a semantic/API mismatch that should be resolved before merge.
Main blocker:
- The exposed signatures in
function_registry_math_ops.ccsuggest runtime arguments are accepted, but implementation enforces literals and the runtime stub params are ignored:- Literal-only enforcement:
random_generator_holder.cc(literal == nullptrchecks for 1-arg and 2-arg variants) - Runtime args ignored:
gdv_function_stubs.cc(gdv_fn_rand_integer_with_rangeandgdv_fn_rand_integer_with_min_maxignore provided args and call(*holder)())
- Literal-only enforcement:
Please pick one explicit direction:
- Keep literal-only semantics: document this clearly and add tests asserting non-literal arguments fail with clear errors.
- Support runtime range/min/max args: consume runtime values in stubs/holder path and revisit nullability/null semantics accordingly.
Additional issues to address:
- Null semantics are currently implicit/inconsistent:
rand_integer(NULL)defaults to range=1 -> always 0 (covered by test)- For 2-arg form, NULLs default to 0/0 but this is undocumented and untested.
- Add end-to-end expression/JIT tests (not just holder-only tests) to validate registry + stub mapping path for all 3 signatures.
Once semantics are clarified and covered, this should be in good shape.
1a1cee0 to
ea76cdb
Compare
- Mark unused literal parameters in stubs with `/*param*/` syntax (matches `regexp_replace`, `regexp_extract` convention) - Add non-literal argument rejection tests for `rand` and `rand_integer` - Add end-to-end projector tests for `rand` (2 tests) and `rand_integer` (3 tests) - Update null semantics: NULL range/max defaults to INT32_MAX instead of 0/1 (more useful behavior) - Use `EXPECT_RAISES_WITH_MESSAGE_THAT` pattern for error validation (matches other Gandiva tests)
I don't want to go overboard with signature rework wrt literals - we seem to have a bit of inconsistency between regexp_replace/regexp_extract, like/ilike, to_date and rand functions. I adjusted implementation to match regexp_replace with commented out argument names. |
|
Pipeline failed due to ongoing GitHub issues. How could PR owner retrigger it? |
|
Only Apache Arrow committers can retrigger jobs. |
|
@lriggs @akravchukdremio @xxlaykxx You may want to review this. |
Rationale for this change
Add
rand_integerfunction to Gandiva to generate random integers, complementing the existingrand/randomfunctions that generate random doubles. This provides native integer random number generation and offers a more efficient alternative toCAST(rand() * range AS INT).What changes are included in this PR?
RandomIntegerGeneratorHolderclass following the existingRandomGeneratorHolderpatternrand_integer()→ int32 in range [INT32_MIN, INT32_MAX]rand_integer(int32 range)→ int32 in range [0, range-1]rand_integer(int32 min, int32 max)→ int32 in range [min, max] inclusivestd::uniform_int_distribution<int32_t>with Mersenne Twister engineAre these changes tested?
Yes, added 8 unit tests in
random_generator_holder_test.cc:NoParams- verifies full int32 rangeWithRange- verifies [0, range-1] boundsWithMinMax- verifies [min, max] inclusive boundsWithNegativeMinMax- verifies negative range handlingInvalidRangeZero- verifies range=0 is rejectedInvalidRangeNegative- verifies negative range is rejectedInvalidMinGreaterThanMax- verifies min > max is rejectedNullRangeDefaultsToOne- verifies null parameter handlingAre there any user-facing changes?
Yes, this adds a new
rand_integerfunction to Gandiva with three signatures as described above.