Skip to content

F-022: perf(services): RegexSet first-pass for secret scanning (re-do)#35

Merged
Sephyi merged 1 commit intodevelopmentfrom
audit/f-022-redo
Apr 30, 2026
Merged

F-022: perf(services): RegexSet first-pass for secret scanning (re-do)#35
Sephyi merged 1 commit intodevelopmentfrom
audit/f-022-redo

Conversation

@Sephyi
Copy link
Copy Markdown
Owner

@Sephyi Sephyi commented Apr 30, 2026

Re-implementation of closed #25 on top of current development. Layered on top of #16's BUILTIN_PATTERNS cache:

  • New PatternSet wraps Vec<SecretPattern> + derived RegexSet.
  • BUILTIN_PATTERN_SET: LazyLock<PatternSet> caches the set for built-in patterns.
  • Scanners use PatternSet::first_match (one combined NFA traversal) instead of per-pattern Regex::is_match loops.
  • Public API preserved: scan_for_secrets_with_patterns and scan_full_diff_with_patterns still take &[SecretPattern] (build a one-shot PatternSet internally).
  • Match semantics preserved: at most one match per added line, lowest-index pattern wins (matches the previous "first hit, break" behavior).

Closes #25.

Test plan

  • cargo fmt --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test --all-targets --all-features (437 tests pass)

Replaces per-pattern `Regex::is_match` loops in the scanners with a
single `RegexSet` traversal that returns the index of the lowest-
indexed pattern that matched. The pattern slice is then consulted only
to look up the descriptive name on a hit.

Layout:
- New `PatternSet` struct holds an owned `Vec<SecretPattern>` plus the
  derived `RegexSet`. `PatternSet::first_match` does the combined NFA
  traversal.
- `BUILTIN_PATTERN_SET: LazyLock<PatternSet>` caches the set for the
  built-in patterns (built once on first access on top of the existing
  `BUILTIN_PATTERNS` cache from F-012).
- `scan_for_secrets` and `scan_full_diff_for_secrets` (no-args) use the
  cached set directly.
- `scan_for_secrets_with_patterns` / `scan_full_diff_with_patterns`
  preserve their `&[SecretPattern]` API by building a one-shot
  `PatternSet` and delegating to private `*_with_pattern_set` helpers,
  so external callers see no API change.
- Behaviour preserved: at most one match per added line; pattern
  precedence is the lowest-index in the configured set, matching the
  previous "first hit, break" semantics.

Closes #25.
Copilot AI review requested due to automatic review settings April 30, 2026 17:06
@Sephyi Sephyi merged commit 732438e into development Apr 30, 2026
1 of 2 checks passed
@Sephyi Sephyi deleted the audit/f-022-redo branch April 30, 2026 17:06
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant