ENH: Generate SPDX 2.3 Software Bill of Materials at configure time#5817
Conversation
|
The VTK modular system uses arguments to the main module macro for the license information: https://docs.vtk.org/en/latest/api/cmake/vtkModule.html#vtk-module-file-contents I think it is worth a human conversation to decide if this information should have a public interface with the main module macro or if it should be separate. The man macro certainly could call a private function such as these. I was just looking into moving FFTW into a third party library structure. With the varied backed-ends supported it is a dynamic case to support, and very relevant to license requirements. Having all the module information defined, along with the license information maybe beneficial to keep some information consistent, and apply certain logic or enforcement such as defaulting internal module to ITK license while requiring remote modules to have these attributes. |
|
Copilot's first PR seems better to me (easier to maintain), with license information distributed in |
|
@copilot Make license and other per-module information distributed in |
|
More importantly, I am unfamiliar with SPDX and therefore will not be able to tell whether Copilot does a great or terrible job (or somewhere in between). |
b6e40e9 to
bc0d0a2
Compare
|
@copilot Make license and other per-module information distributed in |
bc0d0a2 to
8fbee2b
Compare
|
@copilot Make license and other per-module information distributed in |
Done in 8692d2d and 50daeeb. SPDX metadata is now declared directly in each |
50daeeb to
7efe75e
Compare
3902e8a to
bc8d90b
Compare
Prepends the two-line SPDX notice to 5,698 ITK-owned source files (.h/.hxx/.cxx/.txx/.py/.cmake/CMakeLists.txt) and updates the KWStyle template so new files are enforced going forward. // SPDX-FileCopyrightText: Copyright NumFOCUS // SPDX-License-Identifier: Apache-2.0 Applied by Utilities/Maintenance/AddSPDXHeaders.py (companion PR InsightSoftwareConsortium#5817). Modules/ThirdParty/ excluded.
eb7085f to
f0d0acd
Compare
5be0c7e to
d0b2185
Compare
|
Approving this revision. The Copilot-authored SBOM infrastructure has been reorganized into six topic-based commits with the SPDX best-practice gaps I flagged during review addressed (or documented as follow-ups where the scope warranted). Commit layout (force-pushed from 11 → 6 topic commits)Each commit is independently reviewable relative to Review decision 1 — CMake → Python for SBOM JSON constructionThe initial Current state: CMake enumerates enabled modules and writes a line-based Gating: Review decision 2 — SPDX 2.3 compliance audit resultsAudited against the SPDX 2.3 spec + NTIA minimum-elements + CISA SBOM baseline. Easy wins landed now; larger items are documented for follow-up. Landed in this PR:
Deferred with tracking in
Validation runs in the current tree: Review decision 3 —
|
d0b2185 to
b0f0f3a
Compare
hjmjohnson
left a comment
There was a problem hiding this comment.
My detailed analysis is posted in a comment used to update the identified requested changes that are now in place.
|
Hi @hjmjohnson , unfortunately I have not worked with this recently and can't add more details on the initial request in #4302. I took a brief read-through, the updates here do look pretty nice. I will leave it to the ITK maintainer team's discretion for how to move forward. Thanks. |
dzenanz
left a comment
There was a problem hiding this comment.
Thank you for your work on this Hans. I did not review this in any detail. I am trying a local build. The branch targeted is not main.
| if(ITK_GENERATE_SBOM) | ||
| itk_generate_sbom() | ||
| endif() | ||
| include(ITKSBOMValidation) |
There was a problem hiding this comment.
ITKSBOMValidation file is only introduced by a later commit.
|
Locally generated file looks reasonable to me (I am clueless about SPDX). Interestingly, my review check mark is gray, not green, and I am listed in the additional reviewers section. |
Prepends the two-line SPDX notice to 5,698 ITK-owned source files (.h/.hxx/.cxx/.txx/.py/.cmake/CMakeLists.txt) and updates the KWStyle template so new files are enforced going forward. // SPDX-FileCopyrightText: Copyright NumFOCUS // SPDX-License-Identifier: Apache-2.0 Applied by Utilities/Maintenance/AddSPDXHeaders.py (companion PR InsightSoftwareConsortium#5817). Modules/ThirdParty/ excluded.
f0d0acd to
1dcbb3c
Compare
b0f0f3a to
74d3e56
Compare
Prepends the two-line SPDX notice to 5,698 ITK-owned source files (.h/.hxx/.cxx/.txx/.py/.cmake/CMakeLists.txt) and updates the KWStyle template so new files are enforced going forward. // SPDX-FileCopyrightText: Copyright NumFOCUS // SPDX-License-Identifier: Apache-2.0 Applied by Utilities/Maintenance/AddSPDXHeaders.py (companion PR InsightSoftwareConsortium#5817). Modules/ThirdParty/ excluded.
1dcbb3c to
25ce5cb
Compare
Introduces the ITK_GENERATE_SBOM option (ON by default) and the
ITKSBOMGeneration CMake module that emits an SPDX 2.3 JSON Software
Bill of Materials at ${CMAKE_BINARY_DIR}/sbom.spdx.json, installed
under share/spdx/ following the Linux-Foundation convention so that
downstream supply-chain scanners (REUSE, scancode-toolkit, Trivy,
Grype) discover it automatically.
Design:
* CMake collects per-module SPDX metadata at configure time and
writes a simple line-based manifest (sbom-inputs.manifest).
* A Python back-end (Utilities/SPDX/generate_sbom.py) reads the
manifest and writes the final SPDX JSON via json.dump, owning
all field wiring, escape handling, relationship assembly, and
LicenseRef-* format validation. Keeping JSON construction out
of CMake avoids hand-rolled string(APPEND) quoting bugs and
makes the code testable from pytest.
* Python 3 becomes a hard requirement when ITK_GENERATE_SBOM=ON;
the module FATAL_ERRORs at configure time if no interpreter is
found, rather than failing later during generation.
* UUIDv5 seeded by ITK version + timestamp disambiguates
documentNamespace across parallel multi-config reconfigures
within the same second, satisfying SPDX 2.3 section 6.5.
Emitted per package:
* name, versionInfo, downloadLocation, licenseConcluded,
licenseDeclared, copyrightText, filesAnalyzed=false.
* supplier and originator as distinct fields (NumFOCUS supplier
on both ITK and ThirdParty; NumFOCUS originator for ITK-owned,
NOASSERTION for ThirdParty whose upstream author cannot be
determined from CMake). Requested by NTIA SBOM minimum-elements.
* primaryPackagePurpose=LIBRARY for every package.
* externalRefs[purl] when SPDX_PURL is declared, enabling CVE
lookup via Trivy / Grype / OSV-Scanner.
* hasExtractedLicensingInfos[] entries for LicenseRef-* identifiers,
with regex validation against the SPDX 5.x LicenseRef format.
creationInfo.creators records three entries: CMake version, the
ITKSBOMGeneration tool identifier, and the NumFOCUS organization.
Extends the itk_module() macro with SPDX-metadata named arguments so that each ThirdParty module declares its own supply-chain facts next to its source: SPDX_LICENSE SPDX license identifier (Apache-2.0, ...) SPDX_VERSION upstream version of the vendored code SPDX_DOWNLOAD_LOCATION canonical upstream URL SPDX_COPYRIGHT copyright notice text SPDX_CUSTOM_LICENSE_TEXT extracted text for LicenseRef-* identifiers SPDX_CUSTOM_LICENSE_NAME human-readable name for the LicenseRef SPDX_PURL Package URL for CVE-feed mapping SPDX_OPT_OUT exclude from the generated SBOM ITKSBOMGeneration reads the resulting ITK_MODULE_<name>_SPDX_* variables to populate its manifest. Co-locating the metadata with the module keeps stale-license bugs caught by reviewers of the module update commit rather than by downstream compliance scanners. Populates the SPDX metadata for the existing vendored ThirdParty modules: DCMTK, DICOMParser, DoubleConversion, Eigen3, Expat, GDCM, GIFTI, GoogleTest, HDF5, JPEG, KWSys, MINC, MetaIO, NIFTI, Netlib, NrrdIO, OpenJPEG, PNG, TBB, TIFF, VNL, ZLIB, libLBFGS.
Registers four CTest tests (labeled "SBOM") against the SBOM emitted
by ITKSBOMGeneration:
ITKSBOMValidation Always-on lightweight in-tree validator
(validate_light.py): required SPDX 2.3
fields, license-reference integrity,
SPDXID uniqueness. No external deps.
ITKSBOMVersionConsistency Cross-checks SPDX_VERSION in each
Modules/ThirdParty/*/itk-module.cmake
against the tag declared in its
UpdateFromUpstream.sh
(verify_versions.py). Catches the
common bug of bumping a dependency
without updating SPDX metadata.
ITKSBOMSchemaValidation Optional full SPDX 2.3 schema check via
the spdx-tools pip package
(validate_with_spdx_tools.py). Returns
CTest skip code 77 when the package is
not installed, so the test is advisory
rather than a hard CI dependency.
ITKSBOMFingerprint Drift-detection test gated on
ITK_SBOM_FINGERPRINT_BASELINE pointing
at a committed baseline file
(compute_fingerprint.py). The fingerprint
is a SHA-256 over sorted package
(name, version, license, PURL) tuples
and deliberately excludes timestamps
and documentNamespace UUIDs so the value
is reproducible.
Shared constants, JSON loader, and exit codes live in
Utilities/SPDX/_common.py so the four validators stay consistent.
Declares blanket SPDX license annotations via REUSE.toml for the ITK-owned files that do not carry per-file SPDX headers (build-system files, CMake modules, wrapping generator inputs), and provides the canonical SPDX license texts under LICENSES/ as required by the REUSE 3.x specification: LICENSES/Apache-2.0.txt LICENSES/BSD-3-Clause.txt LICENSES/CC-BY-4.0.txt LICENSES/LicenseRef-Josuttis-fdstream.txt LICENSES/LicenseRef-Netlib-SLATEC.txt The annotations use precedence="aggregate" so a per-file SPDX header, where present, takes precedence over the blanket coverage. Files under Modules/ThirdParty/ are intentionally not covered: each vendored project keeps its upstream license notice and is tracked in the SBOM as a separate package. Running `reuse lint` against the tree now succeeds.
Adds Utilities/SPDX/add_headers.py, a utility that prepends the two SPDX lines SPDX-FileCopyrightText: Copyright NumFOCUS SPDX-License-Identifier: Apache-2.0 to ITK-owned C/C++, Python, and CMake source files. It is idempotent, skips files that already carry an SPDX header, and handles shebangs, UTF-8 BOM, and CRLF line endings without clobbering them. Also wires in a local pre-commit hook (check-spdx-headers) that runs `add_headers.py --check --files <paths>` against staged files so that new commits cannot reintroduce files without SPDX headers. The walker-mode invocation (`python3 Utilities/SPDX/add_headers.py <source-tree>`) is available for one-off retroactive migrations.
Adds Utilities/SPDX/README.md summarizing every script in the directory (generate_sbom, add_headers, verify_versions, validate_light, validate_with_spdx_tools, compute_fingerprint), their CTest integrations, and typical workflows for adding SPDX headers, verifying vendored-dependency versions, validating the generated SBOM, and tracking drift via the fingerprint baseline. Adds Utilities/SPDX/TODO.md enumerating the SPDX 2.3 best-practice enhancements intentionally scoped out of the initial landing so reviewers can pick them up as follow-ups (documentNamespace hosting, release-asset attachment, packageVerificationCode, CONTAINS / BUILD_TOOL_OF relationship enrichment, continuous reuse-lint in CI, PURL validation, pytest coverage for generate_sbom.py). Leaves a one-line pointer in Utilities/Maintenance/README.md to the new Utilities/SPDX/ location so existing links remain valid.
74d3e56 to
092e1c4
Compare
Lands the SPDX-2.3 SBOM infrastructure for ITK: per-module metadata in each
itk-module.cmake, a CMake generator that emitssbom.spdx.jsonat configure time, four CTests validating the result (lightweight in-tree validator, UpdateFromUpstream cross-check, spdx-tools reference validator, drift fingerprint), REUSE 3.x metadata with per-license texts underLICENSES/, a migration script, and a pre-commit hook enforcing SPDX on new files.Stacked on #6084 (bulk SPDX file header tagging + KWStyle template update). The two PRs are bound and must merge together or in strict order: #6084 merges first onto
main, then this PR rebases onto the new main and merges. Completes the roadmap in #4302.Clean commit history: 8 commits, each a whole working unit. No "add bug → fix bug" pairs — all P1/P2/P3 fixes from the validation phase are folded into their feature commits from day one.
Why — regulatory drivers (medical imaging + commercial use)
ITK's 2019 community survey documented 32% commercial users, 74% medical imaging focus. Three converging regulatory events make SBOM + SPDX hard requirements for that audience:
SPDX_VERSION/SPDX_LICENSE/SPDX_DOWNLOAD_LOCATIONare the verbatim SOUP fields.Real commercial-audit incidents from ITK Discourse that this infrastructure addresses:
rpoly.fburied 19 years deep in VNL; per-file SPDX +reuse lintin CI would have caught this at import.Commit history (8 clean commits on top of #6084)
Each commit represents a whole working unit. No "add bug → fix bug" pairs: earlier iteration-phase P1/P2/P3 fixes have been squashed into their corresponding feature commits. The NET-zero Python-3.9 compat-shim revert pair is dropped entirely.
ENH: Add SPDX 2.3 SBOM generator at configure time —
CMake/ITKSBOMGeneration.cmake(539 lines). JSON emitter with full escape coverage (\\,\",\n,\r,\t,\b,\f). Correct pluralhasExtractedLicensingInfosfield per SPDX 2.3 schema. Deterministic SHA1-UUIDdocumentNamespaceonitk.orgdomain. Semicolon-safelist(APPEND). Default SPDX license list version 3.28.ENH: Distribute SBOM metadata into per-module itk-module.cmake files — extends
itk_module()withSPDX_LICENSE,SPDX_VERSION,SPDX_DOWNLOAD_LOCATION,SPDX_COPYRIGHT,SPDX_CUSTOM_LICENSE_*,SPDX_PURL,SPDX_OPT_OUT. Path-basedIS_THIRD_PARTYdetection (parent-scope visible). Populates SPDX metadata for all 23 ThirdParty modules.ENH: Add SBOM validation CTest suite — four tests under the
SBOMCTest label:ITKSBOMValidation(lightweight),ITKSBOMVersionConsistency(UpdateFromUpstream cross-check),ITKSBOMSchemaValidation(optionalspdx-tools, returns CTest skip code 77 when missing),ITKSBOMFingerprint(drift detection, gated onITK_SBOM_FINGERPRINT_BASELINE).ENH: Install generated SBOM under share/spdx — Linux Foundation SPDX convention so that downstream scanners (Trivy, Grype, OSV-Scanner, Anchore Syft) auto-discover.
ENH: Add REUSE 3.x compliance metadata —
REUSE.tomlblanket annotations +LICENSES/directory.reuse lintreports 0 non-ThirdParty compliance gaps.ENH: Add SPDX file-header migration script —
Utilities/Maintenance/AddSPDXHeaders.pywith BOM/CRLF safety, shebang handling, first-N-lineneeds_spdx()scan.ENH: Add pre-commit hook enforcing SPDX license headers — local
check-spdx-headershook runs the migration script in--checkmode on staged files.DOC: Document SBOM and SPDX tooling in Utilities/Maintenance — README section covering all five SBOM scripts and typical workflows.
Generated SBOM (default module set)
dataLicense: CC0-1.0LicenseRef-*references properly paired withhasExtractedLicensingInfosdocumentNamespace:https://itk.org/spdx/ITK-<ver>-<SHA1-UUID-v5>Downstream consumers can run:
Why these two PRs are bound
Neither PR is functional without the other:
Merge order must be: #6084 merges first onto
main, then this PR rebases onto the new main and merges. Or merge as an atomic pair via a merge queue.Test plan — verified locally on the combined stack
pip install spdx-tools)pre-commit run --all-filespasses every hook on 5,700+ filesreuse lintpasses for non-ThirdParty files (0 compliance gaps)spdx-tools validate_full_spdx_document()passes on generated SBOMcmake --installplaces SBOM atshare/spdx/sbom.spdx.jsonITK_GENERATE_SBOM=OFFconfigure works without SBOM generationAddSPDXHeaders.py(6 cases)string(UUID)determinism/uniqueness unit-testedReview checklist for maintainers
Suggested review order:
CMake/ITKSBOMGeneration.cmakeend-to-end — main generatorCMake/ITKSBOMValidation.cmake— CTest harnessUtilities/Maintenance/README.mdfor script overviewbuild/sbom.spdx.json)pip install spdx-tools; ctest -R ITKSBOMSchemaValidationModules/ThirdParty/<mod>/itk-module.cmaketo see the declaration patternKnown lower-priority follow-ups (not blockers):
licenseConcludedhardcoded asGPL-2.0-or-later(ignores MKL/commercial FFTW variants)