ENH: Automatic GitHub Actions cache cleanup for closed/merged PRs#6112
Conversation
|
ITK.Linux failed at 12m on a PR that only adds two /azp run ITK.Linux |
|
@greptileai review this draft before I make it official |
|
/azp run ITK.Linux |
This comment was marked as resolved.
This comment was marked as resolved.
561b836 to
63d5f90
Compare
Add two complementary workflows that free the repository's 10 GB Actions-cache budget without adding overhead to regular CI: - cleanup-pr-caches.yml event-driven, fires on PR close - cleanup-stale-caches-nightly.yml scheduled sweep, 3-day grace The event-driven workflow deletes every cache scoped to the closed PR's merge ref within seconds of close, using the minimal permission set (actions: write, contents: read). The nightly sweep is the safety net: it catches caches orphaned during a cleanup-workflow outage, or those pre-dating this workflow. Motivation: ITK regularly hits the 10 GB per-repo cache cap because ccache entries (2-3 GB per platform, 3 platforms) accumulate across open and closed PRs. Once the cap is reached, GitHub silently rejects all subsequent cache saves with "Cache reservation failed: you have reached your configured budget, your cache is now read only". This manifested on PR InsightSoftwareConsortium#6109's Pixi-Cxx + ARMBUILD runs where both the ccache and the new externaldata-* saves failed, even though the workflows are correctly wired. Pattern follows the GitHub documentation for force-deleting cache entries: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries Robustness properties of the on-close workflow: - ref-scoped delete (refs/pull/N/merge) cannot touch refs/heads/main or other PR refs - idempotent: re-running finds 0 caches and exits 0 - works for PRs from forks (runs in upstream context with fork's PR number and upstream's GITHUB_TOKEN) - closed state is terminal: a reopened PR gets fresh cache entries tied to new commits; deletions target the previous-closure era
63d5f90 to
2c38347
Compare
|
@dzenanz A little more house cleaning. This is somewhat orthogonal to the other cache cleanups, but helps to make sense of the caches that are retained. After a merge is done, the cache for that PR can never be reused, so we should not keep it around. |
|
Description sounds good. I did not take a close look at the code. |
|
@dzenanz This is recommended to keep costs down (savings of a few GB at $.008/GB/day, so not much). I like it mostly to remove "dead weight" caches that can never be hit again. |
Adds two complementary workflows that free the repository's 10 GB GitHub Actions cache budget automatically, so cache saves in long-running PRs (ccache, ExternalData, etc.) don't silently fail with "Cache reservation failed: You have reached your configured budget, your cache is now read only".
Pattern follows the GitHub documentation for force-deleting cache entries: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries.
Motivation
ITK's per-repo Actions cache is capped at 10 GB (GitHub default). A single Pixi+ARM cycle produces roughly:
ccache-v4-Linux-pixi-cxx-<sha>ccache-v4-Windows-pixi-cxx-<sha>ccache-v4-macOS-pixi-cxx-<sha>ccache-v4-Linux-Ubuntu-24.04-arm-<sha>ccache-v4-macOS-*-<sha>externaldata-v<ver>-<sha>(from #6109)Multiplied by open PRs, the 10 GB cap is reached quickly. On PR #6109's
dffb9bd1fdrun, every Pixi and ARM cache save failed for this reason, despite the caching being wired correctly.Manually sweeping closed-PR caches frees ~6 GB at a time, but that's not sustainable. These two workflows automate it.
What each workflow does
.github/workflows/cleanup-pr-caches.yml(event-driven)pull_request: types: [closed]refs/pull/<N>/mergeand DELETE eachactions: write, contents: read.github/workflows/cleanup-stale-caches-nightly.yml(safety net)cron: '0 6 * * *'(06:00 UTC) +workflow_dispatchrefs/pull/<N>/mergecache; for each, check PR state; ifstate != OPENandclosedAt < now - 3 days, deleteactions: write, pull-requests: readRobustness properties
DELETE /repos/{owner}/{repo}/actions/caches/{cache-id}targets one cache entry at a time; the enumeration is filtered byref=refs/pull/N/merge, sorefs/heads/mainor other PR refs can never be touched by the on-close path.pull_requestevent runs in the upstream repo context with the upstream'sGITHUB_TOKEN, which has cache delete permissions on the upstream's cache store (where the caches actually live).actions: write, contents: read. Nightly: addspull-requests: read. Neither can push code, comment, or modify anything outside Actions storage.Verification
Manually dry-run the on-close logic against a recently-closed PR to confirm it finds and removes caches:
```bash
PR_REF=refs/pull/6093/merge # merged 2026-04-23
gh api "repos/InsightSoftwareConsortium/ITK/actions/caches?ref=${PR_REF}"
--jq '.actions_caches[] | {id, key, size_mb: (.size_in_bytes / 1048576 | floor)}'
```
(In this PR's context #6093's caches were already manually swept as part of the investigation that motivated this PR.)
After merge, an end-to-end verification is to close a test PR and confirm its caches disappear within seconds.
Ecosystem impact (follow-up)
The same pattern could be added to the shared
InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageActionworkflows so all 20+ remote-module repos inherit cache cleanup in one change. Deferred to a separate PR per user request.