diff --git a/docs/adr/26482-peel-annotated-tags-to-commit-sha.md b/docs/adr/26482-peel-annotated-tags-to-commit-sha.md new file mode 100644 index 00000000000..f7cafa803f0 --- /dev/null +++ b/docs/adr/26482-peel-annotated-tags-to-commit-sha.md @@ -0,0 +1,80 @@ +# ADR-26482: Peel Annotated Git Tags to Commit SHA When Resolving Action Pins + +**Date**: 2026-04-15 +**Status**: Draft +**Deciders**: pelikhan, Copilot + +--- + +## Part 1 — Narrative (Human-Friendly) + +### Context + +The `gh aw compile` command pins GitHub Actions references by SHA to produce reproducible, Renovate-compatible workflow files. The GitHub REST API endpoint `GET /repos/{owner}/{repo}/git/ref/tags/{tag}` returns two semantically different SHAs depending on whether the tag is *lightweight* (SHA points directly to a commit) or *annotated* (SHA points to a tag object, which in turn points to a commit). Before this decision, the compiler always emitted the SHA returned by this endpoint, which for annotated tags is the tag-object SHA rather than the underlying commit SHA. Renovate, and tooling that compares lock files against real commits, expects the commit SHA — so annotated tags caused Renovate to repeatedly rewrite the same `uses:` line, churning the lock file without actually changing anything meaningful. + +### Decision + +We will resolve annotated tags by making a second API call to `GET /repos/{owner}/{repo}/git/tags/{tag-object-sha}` to retrieve the underlying commit SHA (peeling). For lightweight tags the first API call already returns the commit SHA and no second call is needed. We distinguish the two cases by fetching both the object SHA and its type (`[.object.sha, .object.type] | @tsv`) in a single `gh api` invocation, parsing via a new exported helper `ParseTagRefTSV`. The same two-call strategy is applied consistently in both `action_resolver.go` (used by the compiler) and `getActionSHAForTag` in `update_actions.go` (used by `gh aw update-actions`), with code reuse through the shared `ParseTagRefTSV` helper. + +### Alternatives Considered + +#### Alternative 1: Use the `^{}` Dereferencing Suffix via `git ls-remote` + +`git ls-remote` emits a peeled entry suffixed with `^{}` for annotated tags (e.g., `refs/tags/v1.0.0^{}`), which directly resolves to the commit SHA without a second API call. This was not chosen because the primary SHA resolution path already uses the GitHub REST API via `gh api`, not `git ls-remote`. Mixing two different resolution mechanisms would increase code complexity, introduce inconsistent behavior between the primary and fallback paths, and would require shelling out to `git` where the codebase currently relies on `gh`. + +#### Alternative 2: Use the GitHub GraphQL `object { oid }` Query + +The GitHub GraphQL API can resolve a tag ref to its target commit OID in a single query, handling peeling transparently. This approach was not chosen because the rest of the action-resolution code uses the REST API and replacing it with GraphQL would require a larger refactor. GraphQL access also adds a dependency on a different authentication scope and endpoint, whereas the existing `gh api` calls already have the necessary permissions. + +#### Alternative 3: Accept the Tag Object SHA and Normalise at Diff Time + +An alternative is to accept the tag-object SHA as the pin value and teach the diff/update logic to treat tag-object SHAs and commit SHAs as equivalent. This was not chosen because it would require invasive changes to multiple downstream consumers (Renovate config, lock-file diff logic) and doesn't fix the root cause — Renovate correctly expects a commit SHA. + +### Consequences + +#### Positive +- Renovate no longer rewrites the same `uses:` line for annotated-tag-based actions; lock files are stable. +- The new `ParseTagRefTSV` helper is unit-testable in isolation, improving test coverage of the critical parsing step. +- The fix is applied symmetrically in both the compiler path and the update-actions path, so both tools emit the same stable commit SHA. + +#### Negative +- Annotated tags now require two sequential GitHub API calls instead of one, doubling network latency for those tags. In practice most GitHub Actions repositories use annotated tags, so this will be the common case. +- The two-call strategy is sensitive to rate-limiting: a rate-limit hit on the second (peeling) call results in a hard failure, with no partial-result fallback. + +#### Neutral +- The `ParseTagRefTSV` function is exported from `pkg/workflow`, creating a new public symbol that callers outside the package can depend on. Changing its signature in future is a breaking change for any consumers. +- Integration tests require network access to validate the full two-call flow; unit tests cover only the TSV parsing. + +--- + +## Part 2 — Normative Specification (RFC 2119) + +> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). + +### Tag Resolution + +1. Implementations **MUST** emit the underlying commit SHA when pinning a GitHub Actions reference, regardless of whether the tag is lightweight or annotated. +2. Implementations **MUST NOT** emit a tag-object SHA as the pin value for an action reference. +3. When querying `GET /repos/{owner}/{repo}/git/ref/tags/{tag}`, implementations **MUST** fetch both the object SHA and the object type in a single request. +4. Implementations **MUST** treat a response with `object.type == "tag"` as an annotated tag and **MUST** make a second API call to `GET /repos/{owner}/{repo}/git/tags/{tag-object-sha}` to retrieve the commit SHA. +5. Implementations **MUST** treat a response with `object.type == "commit"` as a lightweight tag and **MUST NOT** make a second API call for tag peeling. +6. Implementations **MUST** validate that any resolved SHA is exactly 40 hexadecimal characters; if not, resolution **MUST** fail with a descriptive error. + +### Shared Parsing Helper + +1. The tab-separated parsing of GitHub API tag-ref responses **MUST** be performed via the `workflow.ParseTagRefTSV` function (or a functional equivalent in the same package). +2. Implementations **MUST NOT** duplicate the tab-separated parsing logic inline in multiple call sites; use the shared helper. +3. The `ParseTagRefTSV` function **MUST** return an error for any of the following malformed inputs: empty string, missing tab separator, empty SHA field, empty type field, or SHA that is not exactly 40 characters. + +### Consistency Across Resolution Paths + +1. Both the compiler path (`ActionResolver.resolveFromGitHub`) and the update path (`getActionSHAForTag`) **MUST** apply the same two-call annotated-tag peeling strategy. +2. Implementations **SHOULD** share the peeling logic via the `ParseTagRefTSV` helper to avoid divergence between the two paths. + +### Conformance + +An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Specifically: it always emits a commit SHA (never a tag-object SHA), it validates SHA length, it does not duplicate TSV parsing logic, and it applies the two-call strategy consistently across both resolution paths. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance. + +--- + +*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24474969210) workflow. The PR author must review, complete, and finalize this document before the PR can merge.* diff --git a/pkg/cli/update_actions.go b/pkg/cli/update_actions.go index dc76621352e..e899b4427d3 100644 --- a/pkg/cli/update_actions.go +++ b/pkg/cli/update_actions.go @@ -416,25 +416,45 @@ func getLatestActionReleaseViaGit(ctx context.Context, repo, currentVersion stri return latestCompatible, sha, nil } -// getActionSHAForTag gets the commit SHA for a given tag in an action repository +// getActionSHAForTag gets the commit SHA for a given tag in an action repository. +// For annotated tags (and chained tag objects), it iteratively peels until it +// reaches the underlying non-tag object SHA, matching what tools like Renovate expect. func getActionSHAForTag(ctx context.Context, repo, tag string) (string, error) { updateLog.Printf("Getting SHA for %s@%s", repo, tag) - // Use gh CLI to get the git ref for the tag - output, err := workflow.RunGHContext(ctx, "Fetching tag info...", "api", fmt.Sprintf("/repos/%s/git/ref/tags/%s", repo, tag), "--jq", ".object.sha") + // Fetch both SHA and object type to detect annotated tags. + // Annotated tags have type "tag" and their SHA points to the tag object, + // not the underlying commit. We must peel to get the commit SHA. + output, err := workflow.RunGHContext(ctx, "Fetching tag info...", "api", fmt.Sprintf("/repos/%s/git/ref/tags/%s", repo, tag), "--jq", "[.object.sha, .object.type] | @tsv") if err != nil { return "", fmt.Errorf("failed to resolve tag: %w", err) } - sha := strings.TrimSpace(string(output)) - if sha == "" { - return "", errors.New("empty SHA returned for tag") + sha, objType, err := workflow.ParseTagRefTSV(string(output)) + if err != nil { + return "", fmt.Errorf("failed to parse API response for %s@%s: %w", repo, tag, err) } - // Validate SHA format (should be 40 hex characters) - if len(sha) != 40 { - return "", fmt.Errorf("invalid SHA format: %s", sha) + // Annotated tags (and chained tag objects) point to a tag object rather than + // directly to a commit. Iteratively peel until we reach a non-tag object so + // that emitted action pins use the stable underlying commit SHA rather than a + // mutable tag object SHA (which changes when the tag is re-created). + const maxTagPeelDepth = 10 + for depth := 0; objType == "tag"; depth++ { + if depth >= maxTagPeelDepth { + return "", fmt.Errorf("failed to peel annotated tag: exceeded max depth %d for %s@%s", maxTagPeelDepth, repo, tag) + } + updateLog.Printf("Detected annotated tag for %s@%s (depth %d, tag object SHA: %s), peeling to underlying object", repo, tag, depth, sha) + output2, err := workflow.RunGHContext(ctx, "Peeling annotated tag...", "api", fmt.Sprintf("/repos/%s/git/tags/%s", repo, sha), "--jq", "[.object.sha, .object.type] | @tsv") + if err != nil { + return "", fmt.Errorf("failed to peel annotated tag: %w", err) + } + sha, objType, err = workflow.ParseTagRefTSV(string(output2)) + if err != nil { + return "", fmt.Errorf("failed to parse peeled tag API response for %s@%s: %w", repo, tag, err) + } } + updateLog.Printf("Resolved %s@%s to %s SHA: %s", repo, tag, objType, sha) return sha, nil } diff --git a/pkg/workflow/action_resolver.go b/pkg/workflow/action_resolver.go index 37b6f3114d5..c92816a592f 100644 --- a/pkg/workflow/action_resolver.go +++ b/pkg/workflow/action_resolver.go @@ -73,6 +73,25 @@ func (r *ActionResolver) ResolveSHA(repo, version string) (string, error) { return sha, nil } +// ParseTagRefTSV parses the tab-separated output from the GitHub API +// `[.object.sha, .object.type] | @tsv` jq expression. +// It returns the object SHA and type, or an error if the output is malformed. +// This is a standalone helper so that the parsing logic can be unit-tested +// independently of network calls. +func ParseTagRefTSV(line string) (sha, objType string, err error) { + line = strings.TrimSpace(line) + parts := strings.SplitN(line, "\t", 2) + if len(parts) != 2 || parts[0] == "" || parts[1] == "" { + return "", "", fmt.Errorf("unexpected format: %q", line) + } + sha = parts[0] + objType = parts[1] + if len(sha) != 40 || !gitutil.IsHexString(sha) { + return "", "", fmt.Errorf("invalid SHA format: expected 40 hex characters, got %d (%s)", len(sha), sha) + } + return sha, objType, nil +} + // resolveFromGitHub uses gh CLI to resolve the SHA for an action@version func (r *ActionResolver) resolveFromGitHub(repo, version string) (string, error) { // Extract base repository (for actions like "github/codeql-action/upload-sarif") @@ -84,25 +103,47 @@ func (r *ActionResolver) resolveFromGitHub(repo, version string) (string, error) apiPath := fmt.Sprintf("/repos/%s/git/ref/tags/%s", baseRepo, version) resolverLog.Printf("Querying GitHub API: %s", apiPath) - ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second) + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() - cmd := ExecGHContext(ctx, "api", apiPath, "--jq", ".object.sha") + // Fetch both SHA and object type to detect annotated tags. + // Annotated tags have type "tag" and their SHA points to the tag object, + // not the underlying commit. We must peel to get the commit SHA. + cmd := ExecGHContext(ctx, "api", apiPath, "--jq", "[.object.sha, .object.type] | @tsv") output, err := cmd.Output() if err != nil { - // Try without "refs/tags/" prefix in case version is already a ref return "", fmt.Errorf("failed to resolve %s@%s: %w", repo, version, err) } - sha := strings.TrimSpace(string(output)) - if sha == "" { - return "", fmt.Errorf("empty SHA returned for %s@%s", repo, version) + sha, objType, err := ParseTagRefTSV(string(output)) + if err != nil { + return "", fmt.Errorf("failed to parse API response for %s@%s: %w", repo, version, err) } - // Validate SHA format (should be 40 hex characters) - if len(sha) != 40 { - return "", fmt.Errorf("invalid SHA format for %s@%s: %s", repo, version, sha) + // Annotated tags (and chained tag objects) point to a tag object rather than + // directly to a commit. Iteratively peel until we reach a non-tag object so + // that emitted action pins use the stable underlying commit SHA rather than a + // mutable tag object SHA (which changes when the tag is re-created). + const maxTagPeelDepth = 10 + for depth := 0; objType == "tag"; depth++ { + if depth >= maxTagPeelDepth { + return "", fmt.Errorf("failed to resolve %s@%s: exceeded max tag peel depth %d", repo, version, maxTagPeelDepth) + } + resolverLog.Printf("Detected annotated tag for %s@%s (depth %d, tag object SHA: %s), peeling to underlying object", repo, version, depth, sha) + tagPath := fmt.Sprintf("/repos/%s/git/tags/%s", baseRepo, sha) + peelCtx, peelCancel := context.WithTimeout(context.Background(), 30*time.Second) + cmd2 := ExecGHContext(peelCtx, "api", tagPath, "--jq", "[.object.sha, .object.type] | @tsv") + output2, peelErr := cmd2.Output() + peelCancel() + if peelErr != nil { + return "", fmt.Errorf("failed to peel annotated tag %s@%s: %w", repo, version, peelErr) + } + sha, objType, err = ParseTagRefTSV(string(output2)) + if err != nil { + return "", fmt.Errorf("failed to parse peeled tag API response for %s@%s: %w", repo, version, err) + } } + resolverLog.Printf("Resolved %s@%s to %s SHA: %s", repo, version, objType, sha) return sha, nil } diff --git a/pkg/workflow/action_resolver_test.go b/pkg/workflow/action_resolver_test.go index 1b72564ebdb..6ccf90df5f4 100644 --- a/pkg/workflow/action_resolver_test.go +++ b/pkg/workflow/action_resolver_test.go @@ -105,3 +105,99 @@ func TestActionResolverFailedResolutionCache(t *testing.T) { // Note: Testing the actual GitHub API resolution requires network access // and is tested in integration tests or with network-dependent test tags + +// TestParseTagRefTSV verifies that ParseTagRefTSV correctly parses the tab-separated +// output produced by the GitHub API jq expression `[.object.sha, .object.type] | @tsv`. +// This is the core parsing step used when resolving action tags to SHAs; it must +// distinguish lightweight tags (type "commit") from annotated tags (type "tag") so +// that annotated tags can be peeled to their underlying commit SHA. +func TestParseTagRefTSV(t *testing.T) { + const ( + commitSHA = "ea222e359276c0702a5f5203547ff9d88d0ddd76" + tagObjectSHA = "2fe53acc038ba01c3bbdc767d4b25df31ca5bdfc" + ) + + tests := []struct { + name string + input string + wantSHA string + wantType string + wantErr bool + errContains string + }{ + { + name: "lightweight tag returns commit type", + input: commitSHA + "\tcommit\n", + wantSHA: commitSHA, + wantType: "commit", + }, + { + name: "annotated tag returns tag type", + input: tagObjectSHA + "\ttag\n", + wantSHA: tagObjectSHA, + wantType: "tag", + }, + { + name: "input without trailing newline", + input: commitSHA + "\tcommit", + wantSHA: commitSHA, + wantType: "commit", + }, + { + name: "empty input is rejected", + input: "", + wantErr: true, + errContains: "unexpected format", + }, + { + name: "missing tab separator is rejected", + input: commitSHA, + wantErr: true, + errContains: "unexpected format", + }, + { + name: "empty type field is rejected", + input: commitSHA + "\t", + wantErr: true, + errContains: "unexpected format", + }, + { + name: "short SHA is rejected", + input: "abc123\tcommit", + wantErr: true, + errContains: "invalid SHA format", + }, + { + name: "non-hex SHA is rejected", + input: "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz\tcommit", + wantErr: true, + errContains: "invalid SHA format", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + sha, objType, err := ParseTagRefTSV(tt.input) + if tt.wantErr { + if err == nil { + t.Errorf("ParseTagRefTSV(%q): expected error, got nil", tt.input) + return + } + if !strings.Contains(err.Error(), tt.errContains) { + t.Errorf("ParseTagRefTSV(%q): error = %q, want it to contain %q", tt.input, err.Error(), tt.errContains) + } + return + } + if err != nil { + t.Errorf("ParseTagRefTSV(%q): unexpected error: %v", tt.input, err) + return + } + if sha != tt.wantSHA { + t.Errorf("ParseTagRefTSV(%q): sha = %q, want %q", tt.input, sha, tt.wantSHA) + } + if objType != tt.wantType { + t.Errorf("ParseTagRefTSV(%q): type = %q, want %q", tt.input, objType, tt.wantType) + } + }) + } +}