Add HWY-optimized push-pull hole filling#5186
Open
ssh4net wants to merge 1 commit intoAcademySoftwareFoundation:mainfrom
Open
Add HWY-optimized push-pull hole filling#5186ssh4net wants to merge 1 commit intoAcademySoftwareFoundation:mainfrom
ssh4net wants to merge 1 commit intoAcademySoftwareFoundation:mainfrom
Conversation
HWY-accelerated push/pull implementation for hole filling (guarded by OIIO_USE_HWY). Adds SIMD-aware helpers, data structures (PushPullLevel, tiled views, weight structs), tiled pull/push routines, and finalization paths for float/half/uint16/uint8 and 2/4 channel images. Signed-off-by: Vlad (Kuzmin) Erium <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Highway implementation for ImageBufAlgo::fillholes_pushpull.
The new path is used for the common in-memory cases we can handle safely: 2-channel or 4-channel images, contiguous local pixels, and alpha in the last channel. It supports float, half, uint16, and uint8 inputs.
The algorithm is still the same push-pull algorithm as the existing OIIO code. It builds the full pyramid, uses the same triangle filtering behavior, divides pulled levels by alpha, and composites the pyramid back down. The difference is that the HWY version does the expensive parts in tighter fused kernels: pull
plus alpha divide, upsample plus over, and final conversion/write. This avoids several temporary image operations while keeping the result very close to the existing implementation.
For uninitialized uint8 output, the natural result is promoted to uint16 to avoid excessive rounding during push-pull. Preallocated destinations keep the format requested by the caller.
Tests
Built OIIO with HWY enabled and compared the HWY path against the existing implementation on synthetic and real image cases.
Tested all input/output pairs across:
Images used:
Observed max differences:
Typical speedups:
Benchmarks
Synthetic 4092x4092
Real Odd Crop 3001x1997
Float-output accuracy is still in the ~4e-7 range. Integer/half diffs are one output quantization step.
Checklist:
and if I used AI coding assistants, I have an
Assisted-by: Codex GPT5.5 xHighline in the pull request description above.
behavior.
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
fixed any problems reported by the clang-format CI test.
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.