GH-49740: [C++][Python] Fix casts to view types leaving null variadic buffers#50166
GH-49740: [C++][Python] Fix casts to view types leaving null variadic buffers#50166fenfeng9 wants to merge 5 commits into
Conversation
| util::ToInlineBinaryView("hello"), | ||
| util::ToInlineBinaryView("world"), | ||
| }), | ||
| Raises(StatusCode::Invalid)); |
There was a problem hiding this comment.
Can we test the error message somehow?
There was a problem hiding this comment.
Sure, I'll update the tests later.
| nullptr})); | ||
|
|
||
| struct ArrowArray c_export; | ||
| ASSERT_RAISES(Invalid, ExportArray(*arr, &c_export)); |
There was a problem hiding this comment.
Can we test the error message too?
|
Hmm, I think we should try to produce some additional columns without any variadic buffers in the integration tests. Are you comfortable doing that @fenfeng9 ? Otherwise I can do it. |
Yes, I'm willing to do that. Should I open a new issue for it? |
Or you can do it in this PR, because it would help validate that all implementations are in agreement. |
Sounds good, I'll update it in this PR later. Thanks! |
|
We may want to wait for the outcome of the discussion in #50172, actually |
It's OK, I'll update the code after the discussion is concluded. |
Rationale for this change
Casting to
binary_vieworstring_viewcould leave a null variadic buffer slot when all values were inline. This could happen for casts frombinary,large_binary,string,large_string, andfixed_size_binary.The C Data Interface exporter reads every variadic buffer to get its size. Because of that, exporting such an array could crash, for example through PyArrow
_export_to_c.Validation also passed for these arrays. For all-inline view arrays, validation never needed to read an out-of-line data buffer.
What changes are included in this PR?
This PR fixes the cast kernels so all-inline view arrays do not keep a null variadic buffer slot.
It also makes validation reject null variadic buffer slots, and makes C Data export return an error instead of crashing.
C++ and Python regression tests cover the cast, validation, and export paths.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.
This PR contains a "Critical Fix" Exporting an all-inline view array through the C Data Interface could crash the process while using only public APIs.
_export_to_csegmentation fault forbinary_viewarray #49740