Skip to content

GH-49740: [C++][Python] Fix casts to view types leaving null variadic buffers#50166

Open
fenfeng9 wants to merge 5 commits into
apache:mainfrom
fenfeng9:fix-gh-49740-view-export-to-c
Open

GH-49740: [C++][Python] Fix casts to view types leaving null variadic buffers#50166
fenfeng9 wants to merge 5 commits into
apache:mainfrom
fenfeng9:fix-gh-49740-view-export-to-c

Conversation

@fenfeng9

@fenfeng9 fenfeng9 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

Casting to binary_view or string_view could leave a null variadic buffer slot when all values were inline. This could happen for casts from binary, large_binary, string, large_string, and fixed_size_binary.

The C Data Interface exporter reads every variadic buffer to get its size. Because of that, exporting such an array could crash, for example through PyArrow _export_to_c.

Validation also passed for these arrays. For all-inline view arrays, validation never needed to read an out-of-line data buffer.

What changes are included in this PR?

This PR fixes the cast kernels so all-inline view arrays do not keep a null variadic buffer slot.

It also makes validation reject null variadic buffer slots, and makes C Data export return an error instead of crashing.

C++ and Python regression tests cover the cast, validation, and export paths.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix" Exporting an all-inline view array through the C Data Interface could crash the process while using only public APIs.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @fenfeng9 . Two minor suggestions, but otherwise this PR looks good to me.

util::ToInlineBinaryView("hello"),
util::ToInlineBinaryView("world"),
}),
Raises(StatusCode::Invalid));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test the error message somehow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update the tests later.

nullptr}));

struct ArrowArray c_export;
ASSERT_RAISES(Invalid, ExportArray(*arr, &c_export));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test the error message too?

@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 15, 2026
@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member

Hmm, I think we should try to produce some additional columns without any variadic buffers in the integration tests. Are you comfortable doing that @fenfeng9 ? Otherwise I can do it.

@fenfeng9

Copy link
Copy Markdown
Contributor Author

Hmm, I think we should try to produce some additional columns without any variadic buffers in the integration tests. Are you comfortable doing that @fenfeng9 ? Otherwise I can do it.

Yes, I'm willing to do that. Should I open a new issue for it?

@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member

Yes, I'm willing to do that. Should I open a new issue for it?

Or you can do it in this PR, because it would help validate that all implementations are in agreement.

@fenfeng9

Copy link
Copy Markdown
Contributor Author

Yes, I'm willing to do that. Should I open a new issue for it?

Or you can do it in this PR, because it would help validate that all implementations are in agreement.

Sounds good, I'll update it in this PR later. Thanks!

@pitrou

pitrou commented Jun 15, 2026

Copy link
Copy Markdown
Member

We may want to wait for the outcome of the discussion in #50172, actually

@fenfeng9

Copy link
Copy Markdown
Contributor Author

We may want to wait for the outcome of the discussion in #50172, actually

It's OK, I'll update the code after the discussion is concluded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants