VCF-103 Stats: Improvements to stats run-time performance#877
Merged
alancleary merged 4 commits intomainfrom Mar 20, 2026
Merged
VCF-103 Stats: Improvements to stats run-time performance#877alancleary merged 4 commits intomainfrom
alancleary merged 4 commits intomainfrom
Conversation
These methods return the total size of the classes' buffers in bytes.
This was done by using an unordered set for tracking sample names.
This was done by using an unordered map for saving sample stats, minimizing map lookups via memoization, and replacing a nested map with structs.
This was done by reusing vectors for (missing)GT values, using an unordered set for tracking sample names, minimizing map lookups via memoization, and creating separate codepaths for updating v2 and v3 arrays, the prior of which can be done much more efficiently via appending instead of using the v3 insertion strategy.
jp-dark
approved these changes
Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
VCF-103 implements a new parallelization strategy for the ingestion code path. While profiling and optimizing the code path, it was found that the allele count, variant stats, and sample stats computations were slowing down the hot path so some simple optimizations were made. This PR cherry picks those changes to reduce the scope of the VCF-103 PR.