Skip to content

Migrate SCA Reachability to method-level symbol database (sca-reachability-symbols)#11614

Open
jandro996 wants to merge 6 commits into
masterfrom
alejandro.gonzalez/sca-reachability-add-symbols
Open

Migrate SCA Reachability to method-level symbol database (sca-reachability-symbols)#11614
jandro996 wants to merge 6 commits into
masterfrom
alejandro.gonzalez/sca-reachability-add-symbols

Conversation

@jandro996

@jandro996 jandro996 commented Jun 10, 2026

Copy link
Copy Markdown
Member

What Does This Do

Database format migration

  • Updates GhsaEnrichmentParser to consume the new sca-reachability-symbols/jvm repository format: each entry has lang:"maven", dependency_name, package_versions, and a targets array of "package:ClassName.method" strings; the GHSA ID is embedded in vulnerability.id (no longer passed as a separate parameter)
  • Updates ScaEnrichmentsPlugin API URL from sca-reachability-database/enrichments to sca-reachability-symbols/jvm
  • Updates all four test fixtures to the new format; adds GHSA-cross-package.json fixture for entries with targets from multiple packages under one artifact (models GHSA-cwq5 / Zserio)
  • Regenerates sca_cves.json from the new database format; the file now contains 15 entries across 9 artifacts (including com.github.junrar:junrar, io.github.ndsev:zserio-runtime, and org.apache.tomcat.embed:tomcat-embed-core used as test fixtures)

Removal of class-level detection

  • ScaSymbol no longer has a nullable method; isClassLevel() removed -- all symbols in the new database are method-level
  • ScaCveDatabase skips symbols with method=null at parse time (debug log)
  • ScaReachabilityTransformer removes: reportedHits set, PendingClass inner class, pendingClassEvents queue, processPendingClassEvents(), reportClassLevelHits(), reportClassLevelHitIfPresent(), hasMethodLevelSymbolForClass/Entry()
  • ScaReachabilityHit removes CLASS_LEVEL_SYMBOL constant and dual-mode Javadoc
  • ScaReachabilitySystem drops the processPendingClassEvents() call from the periodic callback; the callback is now a direct method reference to transformer::performPendingRetransforms
  • The two-queue design (pendingClassEvents -> pendingRetransformNames) is simplified: transform() adds directly to pendingRetransformNames on first load

Version resolution fix for JARs without pom.properties

  • DependencyResolver.guessFallbackNoPom() parses the version from the filename when pom.properties is absent (e.g. junrar-7.5.5.jar -> name="junrar", version="7.5.5"). The previous exact-match logic could not match this against "com.github.junrar:junrar".
  • Extracts ScaReachabilityTransformer.matchVersion() as a shared helper: first tries exact dep.name match, then falls back to artifact-ID-only match when dep.name has no ":" (i.e. produced by guessFallbackNoPom). findArtifactInUrl() now delegates to the same helper.
  • Covers standard -cp deployments and Spring Boot nested-JAR deployments where each library's JAR URL is resolvable independently.

Tests

  • ScaRealDatabaseTest: verifies that the three new library classes are indexed correctly in the real sca_cves.json, and that all tracked symbols are method-level
  • ScaRealLibraryBytecodeTest: end-to-end tests on real library bytecode -- three injection tests (ASM produces larger bytecode, proving the named method exists) plus one callback-firing test for LocalFolderExtractor.createDirectory (calls method via reflection, verifies hit reported even when the method body throws NPE due to a null argument)
  • GhsaEnrichmentParserTest: extended with cases for malformed targets (no colon, no dot after colon, all malformed -> empty output), cross-package targets, and GHSA ID read from vulnerability.id
  • ScaReachabilityTransformerJava9Test: adds five unit tests for matchVersion() covering exact match, artifact-ID-only fallback, group-ID-present guard, empty list, and precedence of exact over fallback

Smoke test

  • Adds com.github.junrar:junrar:7.5.5 to the springboot smoke app (vulnerable under GHSA-hf5p-q87m-crj7, range >=0,<7.5.10)
  • ScaReachabilityInit migrated from new Yaml() to Class.forName("com.github.junrar.LocalFolderExtractor") (constructor is package-private; Class.forName triggers the load without instantiation)
  • Adds synthetic META-INF/maven/com.github.junrar/junrar/pom.properties in the smoke test app resources so DependencyResolver can identify junrar in the merged shadow JAR (junrar-7.5.5.jar itself has no pom.properties)
  • ScaReachabilitySmokeTest updated to assert junrar appears with type:"reachability" metadata, version 7.5.5, and reached:[]

Motivation

Follows up on #11352. The sca-reachability-database repository has been superseded by sca-reachability-symbols, which ships method-level symbols directly (no class-level detection). This PR aligns the Gradle build plugin and the runtime transformer with the new schema.

Jira ticket: APPSEC-62260

Additional Notes

The new target string format is "package:ClassName.method". Parsing uses lastIndexOf(':') to split package from class+method, then lastIndexOf('.') on the remainder to split class name from method. Inner-class targets (e.g. pkg:Outer.Inner.method) are not yet in the database; a TODO marks the point where replace('.', '/') would need to be updated to handle $ separators.

The smoke test previously used snakeyaml, which is no longer in the new database. junrar 7.5.5 covers the same detection scenario: the class is loaded at startup, the transformer retransforms it on the first heartbeat, registers the CVE with reached:[], and no vulnerable method is called, so reached stays empty.

The matchVersion() artifact-ID fallback covers real Spring Boot fat JARs where the class URL resolves to the individual inner JAR (jar:file:/app.jar!/BOOT-INF/lib/junrar-7.5.5.jar). The synthetic pom.properties in the smoke test resources covers the shadow-JAR build (all JARs merged, so no per-library JAR URL is available to extract the version from the filename).

Contributor Checklist

Jira ticket: APPSEC-62260

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

…ormat

Updates the Gradle plugin and runtime transformer to consume the new
sca-reachability-symbols/jvm database format, which ships method-level
symbols (class + method name) instead of class-level symbols.

Key changes:
- GhsaEnrichmentParser: new target string format "package:ClassName.method";
  GHSA ID read from vulnerability.id; one entry per dependency_name
- ScaEnrichmentsPlugin: updated API URL to sca-reachability-symbols/jvm
- ScaReachabilityTransformer: removed class-level detection (reportedHits,
  PendingClass, pendingClassEvents, processPendingClassEvents); simplified
  two-phase design to direct pendingRetransformNames enqueue on first load
- ScaSymbol: method is now always non-null; isClassLevel() removed
- ScaReachabilityHit: removed CLASS_LEVEL_SYMBOL constant
- sca_cves.json: regenerated with junrar, zserio-runtime, tomcat-embed-core
- New tests: ScaRealDatabaseTest, ScaRealLibraryBytecodeTest
- Smoke test: migrated from snakeyaml to junrar 7.5.5 (GHSA-hf5p-q87m-crj7)

tag: no release note
@datadog-datadog-prod-us1

This comment has been minimized.

@jandro996 jandro996 added type: enhancement Enhancements and improvements comp: telemetry Telemetry labels Jun 10, 2026
@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88fad29d9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@dd-octo-sts

dd-octo-sts Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.99 s 13.94 s [-0.3%; +1.1%] (no difference)
startup:insecure-bank:tracing:Agent 12.90 s 12.95 s [-1.3%; +0.5%] (no difference)
startup:petclinic:appsec:Agent 16.87 s 16.65 s [+0.4%; +2.2%] (maybe worse)
startup:petclinic:iast:Agent 16.90 s 16.95 s [-1.2%; +0.6%] (no difference)
startup:petclinic:profiling:Agent 16.81 s 16.74 s [-0.5%; +1.4%] (no difference)
startup:petclinic:sca:Agent 16.87 s 16.71 s [-0.0%; +1.9%] (no difference)
startup:petclinic:tracing:Agent 16.07 s 15.92 s [-0.2%; +2.0%] (no difference)

Commit: a4fbf95f · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@jandro996 jandro996 marked this pull request as ready for review June 10, 2026 12:26
@jandro996 jandro996 requested review from a team as code owners June 10, 2026 12:26

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5741a6e181

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

DependencyResolver.guessFallbackNoPom() parses the version from the
filename when pom.properties is absent (e.g. junrar-7.5.5.jar).
The resulting Dependency has name="junrar" (no group ID), which
the previous exact-match logic could not match against the
"com.github.junrar:junrar" artifact name.

Add ScaReachabilityTransformer.matchVersion() as a shared helper
that first tries an exact name match, then falls back to an
artifact-ID-only match (dep.name without ":") for deps produced by
guessFallbackNoPom.  findArtifactInUrl() is updated to use the
same helper so both resolution paths benefit.

For the smoke test (shadow JAR): add a synthetic pom.properties for
junrar under src/main/resources/META-INF/maven so DependencyResolver
can find the version when all JARs are merged into one flat file.
findArtifactInUrl was a private one-liner used in exactly one place.
Inline it into findArtifactVersionInClasspath to remove unnecessary
indirection.
guessFallbackNoPom can return a Dependency with name=null for JARs whose
filename does not match the artifact-version pattern. The artifact-ID
fallback loop called dep.name.contains(":") without a null check, which
would throw NPE in that edge case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: telemetry Telemetry type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant