Speed up scalar Sobol with closed-form evaluation#97
Open
wantonsushi wants to merge 3 commits into
Open
Conversation
Collaborator
|
Hi @wantonsushi. I've just given the PR a scan and it is looking good. This is great work, thank you. Very interesting to see the comparison. I'll follow up with a closer read of the code details sometime tomorrow. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello OpenQMC maintainers!
This PR implements the Ahmed 2024 paper mentioned in issue #67 for the scalar path of
sobolReversedIndex. I used the existing benchmark tool to check timings andgenerateto confirm the output is unchanged.Changes
include/oqmc/owen.h: the scalar path uses the closed form.src/tools/cli/matrices.cpp: derives the steps from the generator matrices and prints them, the same way it already printsdirections[].src/tests/owen.cpp:OwenTest.SobolReversedIndexchecks the result against the direction matrices for every 16-bit index and dimension.I also tried to implement a SIMD version (sobol_simd_experiment.cpp). It's not in this PR since it doesn't beat the existing SIMD, but I'm sharing it so you can reproduce the SIMD timings below.
Timings
Using
benchmark sobol samples, median of 9 runs, in microseconds:The speedup is small because generation is only part of a draw, along with scrambling and state hashing. The SIMD implementation fits one dimension per lane (four lanes). The existing SIMD path is faster because it packs all sixteen matrix columns into a wider register, so I left the SIMD paths alone.
Verification
generate soboloutput is byte-identical to before on scalar, SSE and AVXclang-formatandclang-tidyclean