Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Spark sha2 currently evaluates scalars via make_scalar_function(sha2_impl, vec![]), which expands scalar inputs to size-1 arrays before execution. This adds avoidable overhead for scalar evaluation / constant folding scenarios.

In addition, the existing digest-to-hex formatting uses write!(&mut s, "{b:02x}") in a loop, which is significantly slower than a LUT-based hex encoder.

What changes are included in this PR?

  1. a match-based scalar fast path for sha2 to avoid scalar→array expansion, and
  2. a faster LUT-based hex encoder to replace write! formatting.
Benchmark Before After Speedup
sha2/scalar/size=1 1.0408 µs 339.29 ns ~3.07x
sha2/array_binary_256/size=1024 604.13 µs 295.09 µs ~2.05x
sha2/array_binary_256/size=4096 2.3508 ms 1.2095 ms ~1.94x
sha2/array_binary_256/size=8192 4.5192 ms 2.2826 ms ~1.98x

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the spark label Feb 2, 2026
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look into adding a fast path for when values is array but bit length is scalar; I assume that would be another common usecase

let bytes = data.as_ref();
let mut out = Vec::with_capacity(bytes.len() * 2);
for &b in bytes {
out.push(HEX_CHARS[(b >> 4) as usize]);
Copy link
Contributor

@comphead comphead Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be a reason for extra LLVM boundaries check, maybe worth to check also

let hi = b >> 4;
let lo = b & 0x0F;

out.push(HEX_CHARS[hi as usize]);
out.push(HEX_CHARS[lo as usize]);

rustc might be smart enough to rewrite by itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That looks reasonable.

let hi = b >> 4;
let lo = b & 0x0F;
out.push(HEX_CHARS[hi as usize]);
out.push(HEX_CHARS[lo as usize]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hex crate is used in other datafusion-** crates as an optional dependency. It also uses SIMD to be even faster for bigger input.
Consider using it here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I will look into this.

Copy link
Contributor Author

@kumarUjjawal kumarUjjawal Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sha2 digests are fixed-size (28/32/48/64 bytes), the LUT approach is already quite fast. I don't know if adding the hex will help here? What do you think?

fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
make_scalar_function(sha2_impl, vec![])(&args.args)
let values = &args.args[0];
let bit_lengths = &args.args[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use take_function_args() for consistency with other functions and better error handling ?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kumarUjjawal it is LGTM, I'll wait for other folks to confirm

Ive restarted CI, seems the CI errors are not PR related

@kumarUjjawal
Copy link
Contributor Author

Thanks @kumarUjjawal it is LGTM, I'll wait for other folks to confirm

Ive restarted CI, seems the CI errors are not PR related

Thank you!

where
BinaryArrType: BinaryArrayType<'a>,
{
sha2_binary_bitlen_iter(values, std::iter::repeat(Some(bit_length)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of removing the match logic on the hot loop below, if we know the bit length for all values; I think it'll result in more verbose code but could be worth performance. Can look into this in a followup

ColumnarValue::Scalar(value_scalar),
ColumnarValue::Scalar(ScalarValue::Int32(Some(bit_length))),
) => {
if value_scalar.is_null() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pull all null checks into a single branch at the top, e.g.

match (values, bit_lengths) {
    (ColumnarValue::Scalar(s), _) | (_, ColumnarValue::Scalar(s)) if s.is_null() => { // return scalar null }

This means we'd only need 4 arms:

  • One arm checking if either is null
  • One arm for scalar + scalar
  • One arm for array + scalar
  • Catch all (array + array, scalar + array)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should i make the changes in this pr or will these be in the follow up too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize spark sha2

4 participants