-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix array_repeat handling of null count values
#20102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| OffsetBuffer::new(offsets.into()), | ||
| repeated_values, | ||
| None, | ||
| Some(NullBuffer::new(nulls.finish())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can copy the nulls buffer from the count array instead of using a builder
| Some(NullBuffer::new(nulls.finish())), | |
| count_array.nulls().cloned(), |
| let total_repeated_values: usize = count_array | ||
| .values() | ||
| .iter() | ||
| .map(|&c| if c > 0 { c as usize } else { 0 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: technically the spec allows null slots to have non-0 values, so this could overestimate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible fix:
let total_repeated_values: usize = (0..count_array.len())
.map(|i| get_count_with_validity(count_array, i).0)
.sum();| ), | ||
| Arc::new(inner_list), | ||
| None, | ||
| Some(NullBuffer::new(outer_nulls.finish())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note here about reusing input null buffer
| let total_repeated_values: usize = count_array | ||
| .values() | ||
| .iter() | ||
| .map(|&c| if c > 0 { c as usize } else { 0 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible fix:
let total_repeated_values: usize = (0..count_array.len())
.map(|i| get_count_with_validity(count_array, i).0)
.sum();| count_array: &UInt64Array, | ||
| count_array: &Int64Array, | ||
| ) -> Result<ArrayRef> { | ||
| let counts = count_array.values(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also does not check for NULLs in the count_array and may lead to overestimates. You need to use get_count_with_validity() at line 245 too
| let (count, is_valid) = get_count_with_validity(count_array, idx); | ||
|
|
||
| running_offset += count; | ||
| offsets.push(O::from_usize(running_offset).unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extreme case:
- if the input values is ListArray, then its offset type will be i32
- and if the count value is bigger than i32::MAX
- then i32::from_usize(i32::MAX + 1) will return None and it will panic
Co-authored-by: Martin Grigorov <[email protected]> Co-authored-by: Jeffrey Vo <[email protected]>
|
Thanks for the review. |
Which issue does this PR close?
array_repeatfunction does not handle null values in count array #20075.Rationale for this change
The previous implementation of
array_repeatrelied on Arrow defaults when handling null and negative count values. As a result, null counts were implicitly treated as zero and returned empty arrays, which is a correctness issue.This PR makes the handling of these edge cases explicit and aligns the function with SQL null semantics.
What changes are included in this PR?
Int64Are these changes tested?
Yes, SLTs added and pass.
Are there any user-facing changes?
Yes. When the count value is null,
array_repeatnow returns a null array instead of an empty array.