Eliminate looped time.After timer leaks, propagate cancellation correctly, and enforce timeafterleak in CI#39188
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
time.After timer leaks and enforce timeafterleak in CI
|
Hey
If you would like a hand with those, you can assign this prompt to your coding agent:
|
|
✅ Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR. The changes include |
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate loop-scoped time.After usage inside select statements (to avoid per-iteration timer leaks) and to enforce the timeafterleak custom analyzer in CI to prevent regressions. In addition, it updates the embedded GitHub Actions workflow JSON schema by wrapping several $ref-based definitions in allOf (which can change validation semantics by ensuring sibling keywords are applied).
Changes:
- Replaced looped
time.After(...)calls withtime.NewTimer(...)andtimer.Stop()onctx.Done()paths in three CLI loops. - Enabled the
-timeafterleakcustom analyzer in.github/workflows/cgo.yml. - Updated
pkg/workflow/schemas/github-workflow.jsonto wrap certain$refusages inallOf(not described in the PR metadata, but included in the diff).
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/schemas/github-workflow.json | Wrapes various $ref sites in allOf (notably around event objects/types), potentially changing schema validation behavior. |
| pkg/cli/mcp_inspect_mcp_scripts_server.go | Replaces looped time.After with an explicit timer in server readiness polling. |
| pkg/cli/docker_images.go | Replaces looped time.After with an explicit timer in retry backoff handling. |
| pkg/cli/add_interactive_workflow.go | Replaces looped time.After with an explicit timer in status polling. |
| .github/workflows/cgo.yml | Adds -timeafterleak to the custom linter flags in CI. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 2
| "branch_protection_rule": { | ||
| "$comment": "https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows#branch_protection_rule", | ||
| "$ref": "#/definitions/eventObject", | ||
| "allOf": [ | ||
| { | ||
| "$ref": "#/definitions/eventObject" | ||
| } |
| "types": { | ||
| "$ref": "#/definitions/types", | ||
| "allOf": [ | ||
| { | ||
| "$ref": "#/definitions/types" | ||
| } |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
✅ Design Decision Gate — ADR VerifiedADR reviewed: ADR-39133: Ship a Custom This PR is the direct follow-up to ADR-39133 (which introduced the 📋 Verification Summary
No divergences found; the ADR and implementation are in sync.
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose, /tdd, and /improve-codebase-architecture — requesting changes on one correctness gap and flagging several improvement opportunities.
📋 Key Themes & Highlights
Key Themes
- Silent error drop (
mcp_inspect_mcp_scripts_server.go):waitForServerReadyreturnsbool, soctx.Err()is silently dropped — callers cannot distinguish cancellation from timeout. This is a latent correctness issue exposed by the refactor. - Missing regression tests: All three fix sites lack context-cancellation tests. The linter enforces the pattern at the syntax level, but a misbehaving refactor (e.g. removing
timer.Stop()) would pass CI since test files are excluded from linting (-test=false). - Duplicated idiom: All three files implement the same
NewTimer/select/Stopblock inline. AcontextSleep(ctx, d)helper would consolidate the pattern and make future improvements (e.g. drain after Stop) a one-line change. - Unrelated schema change:
pkg/workflow/schemas/github-workflow.json(+245/-49) has no connection to timer leaks. Bundling it makes bisection harder and the$ref → allOfwrapping across dozens of event definitions warrants independent review.
Positive Highlights
- ✅ Correct fix pattern at all three call sites —
time.NewTimerwithtimer.Stop()on thectx.Done()path - ✅ CI enforcement via
-timeafterleakis the right systemic control; once merged, the linter will block future regressions in production code - ✅ Linter has thorough testdata coverage including edge cases (FuncLit boundary, single-case select, nolint directives)
- ✅ Exponential-backoff timer duration in
docker_images.gois naturally handled by the new-per-iteration approach
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 404.7 AIC · ⌖ 26 AIC · ⊞ 27.9K
| timer := time.NewTimer(mcpScriptsServerStartupDelay) | ||
| select { | ||
| case <-ctx.Done(): | ||
| timer.Stop() |
There was a problem hiding this comment.
[/diagnose] return false silently drops the cancellation reason — callers cannot distinguish "context was cancelled" from "server never became ready".
💡 Suggested fix: return error instead of bool
Changing the function signature to func waitForServerReady(...) error lets the cancellation propagate:
case <-ctx.Done():
timer.Stop()
return ctx.Err() // propagates cancellation
case <-timer.C:
}
// ...
mcpInspectLog.Printf("Server did not become ready within timeout")
return fmt.Errorf("server did not become ready within timeout")With bool, a cancelled context and a genuine timeout produce the same false result. Callers log "server not ready" even when the real cause is an upstream shutdown, making incidents harder to diagnose.
| var workflowFound bool | ||
| for i := range 5 { | ||
| // Wait 2 seconds before each check (including the first) | ||
| timer := time.NewTimer(2 * time.Second) |
There was a problem hiding this comment.
[/tdd] No regression test covers context cancellation in this polling loop — the most common real-world code path for the timer fix.
💡 Suggested test skeleton
func TestCheckStatusAndOfferRun_ContextCancelled(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
cancel() // pre-cancel
cfg := &AddInteractiveConfig{Verbose: true}
err := cfg.checkStatusAndOfferRun(ctx)
if !errors.Is(err, context.Canceled) {
t.Errorf("expected context.Canceled, got %v", err)
}
}The current timer fix is correct, but without a test a future refactor could reintroduce time.After (or forget timer.Stop()) and silently pass CI since the linter only covers production files (-test=false).
| dockerImagesLog.Printf("Failed to download image %s (attempt %d/%d). Retrying in %ds...", image, attempt, maxAttempts, waitTime) | ||
|
|
||
| // Use context-aware sleep | ||
| timer := time.NewTimer(time.Duration(waitTime) * time.Second) |
There was a problem hiding this comment.
[/improve-codebase-architecture] This is the third file in this PR that implements the same NewTimer → select → Stop-on-cancel idiom. A shared helper would prevent future drift and consolidate the canonical pattern.
💡 Suggested helper
// contextSleep pauses for d or until ctx is cancelled.
// Returns ctx.Err() if the context fires first, nil otherwise.
func contextSleep(ctx context.Context, d time.Duration) error {
timer := time.NewTimer(d)
select {
case <-ctx.Done():
timer.Stop()
return ctx.Err()
case <-timer.C:
return nil
}
}All three call sites (add_interactive_workflow.go, docker_images.go, mcp_inspect_mcp_scripts_server.go) would reduce to a single if err := contextSleep(ctx, d); err != nil { return err }. Any future fix or improvement (e.g., adding drain after Stop()) would only need to be made once.
| # legacy custom analyzer findings in tests or other analyzer families. | ||
| - name: Run custom linters | ||
| run: make golint-custom LINTER_FLAGS="-errstringmatch -panicinlibrarycode -manualmutexunlock -osexitinlibrary -rawloginlib -regexpcompileinfunction -fprintlnsprintf -strconvparseignorederror -jsonmarshalignoredeerror -uncheckedtypeassertion -fmterrorfnoverbs -tolowerequalfold -httpnoctx -test=false" | ||
| run: make golint-custom LINTER_FLAGS="-errstringmatch -panicinlibrarycode -manualmutexunlock -osexitinlibrary -rawloginlib -regexpcompileinfunction -fprintlnsprintf -strconvparseignorederror -jsonmarshalignoredeerror -uncheckedtypeassertion -fmterrorfnoverbs -tolowerequalfold -httpnoctx -timeafterleak -test=false" |
There was a problem hiding this comment.
[/diagnose] -test=false means the timeafterleak linter skips all test files. Test helpers, integration test fixtures, and fake clocks that use time.After in a loop will not be caught by this gate. Consider adding a separate lint step (or a comment in the CI YAML) noting this intentional scope limit so future contributors know test files are unprotected.
|
@copilot run pr-finisher skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Implemented PR-finisher pass in commit |
time.After timer leaks and enforce timeafterleak in CItime.After timer leaks, propagate cancellation correctly, and enforce timeafterleak in CI
timeafterleakidentified three production retry/poll loops usingtime.Afterinselectwithin loop bodies, which allocates one timer per iteration until it fires. This PR converts those sites to explicit timers, tightens cancellation behavior in MCP server readiness polling, and adds CI enforcement so the pattern does not regress.Timer lifecycle fix in looped select paths
time.After(...)receives withtime.NewTimer(...).ctx.Done()paths, explicitly calltimer.Stop()before returning.pkg/cli/docker_images.go(retry backoff loop)pkg/cli/add_interactive_workflow.go(status polling loop)pkg/cli/mcp_inspect_mcp_scripts_server.go(server readiness polling loop)Cancellation/error propagation improvement
falseresult.ctx.Err()from readiness polling.CI enforcement
-timeafterleakto custom linter flags in.github/workflows/cgo.ymlso loopedtime.Afterusage is blocked going forward.-test=falseintentionally scopes this check to production code.Regression tests
checkStatusAndOfferRuninpkg/cli/add_interactive_workflow_test.go.pkg/cli/mcp_inspect_mcp_scripts_server_test.go.Example pattern now used at these call sites: