Migrate SCA Reachability to method-level symbol database (sca-reachability-symbols)#11614
Migrate SCA Reachability to method-level symbol database (sca-reachability-symbols)#11614jandro996 wants to merge 6 commits into
Conversation
…ormat Updates the Gradle plugin and runtime transformer to consume the new sca-reachability-symbols/jvm database format, which ships method-level symbols (class + method name) instead of class-level symbols. Key changes: - GhsaEnrichmentParser: new target string format "package:ClassName.method"; GHSA ID read from vulnerability.id; one entry per dependency_name - ScaEnrichmentsPlugin: updated API URL to sca-reachability-symbols/jvm - ScaReachabilityTransformer: removed class-level detection (reportedHits, PendingClass, pendingClassEvents, processPendingClassEvents); simplified two-phase design to direct pendingRetransformNames enqueue on first load - ScaSymbol: method is now always non-null; isClassLevel() removed - ScaReachabilityHit: removed CLASS_LEVEL_SYMBOL constant - sca_cves.json: regenerated with junrar, zserio-runtime, tomcat-embed-core - New tests: ScaRealDatabaseTest, ScaRealLibraryBytecodeTest - Smoke test: migrated from snakeyaml to junrar 7.5.5 (GHSA-hf5p-q87m-crj7) tag: no release note
This comment has been minimized.
This comment has been minimized.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 88fad29d9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
🟢 Java Benchmark SLOs — All performance SLOs passed
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5741a6e181
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
DependencyResolver.guessFallbackNoPom() parses the version from the filename when pom.properties is absent (e.g. junrar-7.5.5.jar). The resulting Dependency has name="junrar" (no group ID), which the previous exact-match logic could not match against the "com.github.junrar:junrar" artifact name. Add ScaReachabilityTransformer.matchVersion() as a shared helper that first tries an exact name match, then falls back to an artifact-ID-only match (dep.name without ":") for deps produced by guessFallbackNoPom. findArtifactInUrl() is updated to use the same helper so both resolution paths benefit. For the smoke test (shadow JAR): add a synthetic pom.properties for junrar under src/main/resources/META-INF/maven so DependencyResolver can find the version when all JARs are merged into one flat file.
findArtifactInUrl was a private one-liner used in exactly one place. Inline it into findArtifactVersionInClasspath to remove unnecessary indirection.
guessFallbackNoPom can return a Dependency with name=null for JARs whose
filename does not match the artifact-version pattern. The artifact-ID
fallback loop called dep.name.contains(":") without a null check, which
would throw NPE in that edge case.
What Does This Do
Database format migration
GhsaEnrichmentParserto consume the newsca-reachability-symbols/jvmrepository format: each entry haslang:"maven",dependency_name,package_versions, and atargetsarray of"package:ClassName.method"strings; the GHSA ID is embedded invulnerability.id(no longer passed as a separate parameter)ScaEnrichmentsPluginAPI URL fromsca-reachability-database/enrichmentstosca-reachability-symbols/jvmGHSA-cross-package.jsonfixture for entries with targets from multiple packages under one artifact (models GHSA-cwq5 / Zserio)sca_cves.jsonfrom the new database format; the file now contains 15 entries across 9 artifacts (includingcom.github.junrar:junrar,io.github.ndsev:zserio-runtime, andorg.apache.tomcat.embed:tomcat-embed-coreused as test fixtures)Removal of class-level detection
ScaSymbolno longer has a nullablemethod;isClassLevel()removed -- all symbols in the new database are method-levelScaCveDatabaseskips symbols withmethod=nullat parse time (debug log)ScaReachabilityTransformerremoves:reportedHitsset,PendingClassinner class,pendingClassEventsqueue,processPendingClassEvents(),reportClassLevelHits(),reportClassLevelHitIfPresent(),hasMethodLevelSymbolForClass/Entry()ScaReachabilityHitremovesCLASS_LEVEL_SYMBOLconstant and dual-mode JavadocScaReachabilitySystemdrops theprocessPendingClassEvents()call from the periodic callback; the callback is now a direct method reference totransformer::performPendingRetransformspendingClassEvents->pendingRetransformNames) is simplified:transform()adds directly topendingRetransformNameson first loadVersion resolution fix for JARs without pom.properties
DependencyResolver.guessFallbackNoPom()parses the version from the filename whenpom.propertiesis absent (e.g.junrar-7.5.5.jar-> name="junrar", version="7.5.5"). The previous exact-match logic could not match this against"com.github.junrar:junrar".ScaReachabilityTransformer.matchVersion()as a shared helper: first tries exactdep.namematch, then falls back to artifact-ID-only match whendep.namehas no":"(i.e. produced byguessFallbackNoPom).findArtifactInUrl()now delegates to the same helper.-cpdeployments and Spring Boot nested-JAR deployments where each library's JAR URL is resolvable independently.Tests
ScaRealDatabaseTest: verifies that the three new library classes are indexed correctly in the realsca_cves.json, and that all tracked symbols are method-levelScaRealLibraryBytecodeTest: end-to-end tests on real library bytecode -- three injection tests (ASM produces larger bytecode, proving the named method exists) plus one callback-firing test forLocalFolderExtractor.createDirectory(calls method via reflection, verifies hit reported even when the method body throws NPE due to a null argument)GhsaEnrichmentParserTest: extended with cases for malformed targets (no colon, no dot after colon, all malformed -> empty output), cross-package targets, and GHSA ID read fromvulnerability.idScaReachabilityTransformerJava9Test: adds five unit tests formatchVersion()covering exact match, artifact-ID-only fallback, group-ID-present guard, empty list, and precedence of exact over fallbackSmoke test
com.github.junrar:junrar:7.5.5to the springboot smoke app (vulnerable under GHSA-hf5p-q87m-crj7, range>=0,<7.5.10)ScaReachabilityInitmigrated fromnew Yaml()toClass.forName("com.github.junrar.LocalFolderExtractor")(constructor is package-private;Class.forNametriggers the load without instantiation)META-INF/maven/com.github.junrar/junrar/pom.propertiesin the smoke test app resources soDependencyResolvercan identify junrar in the merged shadow JAR (junrar-7.5.5.jar itself has no pom.properties)ScaReachabilitySmokeTestupdated to assert junrar appears withtype:"reachability"metadata, version7.5.5, andreached:[]Motivation
Follows up on #11352. The
sca-reachability-databaserepository has been superseded bysca-reachability-symbols, which ships method-level symbols directly (no class-level detection). This PR aligns the Gradle build plugin and the runtime transformer with the new schema.Jira ticket: APPSEC-62260
Additional Notes
The new target string format is
"package:ClassName.method". Parsing useslastIndexOf(':')to split package from class+method, thenlastIndexOf('.')on the remainder to split class name from method. Inner-class targets (e.g.pkg:Outer.Inner.method) are not yet in the database; a TODO marks the point wherereplace('.', '/')would need to be updated to handle$separators.The smoke test previously used snakeyaml, which is no longer in the new database. junrar 7.5.5 covers the same detection scenario: the class is loaded at startup, the transformer retransforms it on the first heartbeat, registers the CVE with
reached:[], and no vulnerable method is called, soreachedstays empty.The
matchVersion()artifact-ID fallback covers real Spring Boot fat JARs where the class URL resolves to the individual inner JAR (jar:file:/app.jar!/BOOT-INF/lib/junrar-7.5.5.jar). The syntheticpom.propertiesin the smoke test resources covers the shadow-JAR build (all JARs merged, so no per-library JAR URL is available to extract the version from the filename).Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: APPSEC-62260
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.