Skip to content

release: merge audit fixes for v1.3.22#75

Merged
ysyneu merged 10 commits into
mainfrom
codex/release-2026-07-02-main
Jul 2, 2026
Merged

release: merge audit fixes for v1.3.22#75
ysyneu merged 10 commits into
mainfrom
codex/release-2026-07-02-main

Conversation

@ysyneu

@ysyneu ysyneu commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary\n- Merge audit fixes from PRs #66, #67, #68, #73, #74 onto main without bringing unrelated feat/ai-sre commits.\n- Fix incident list compact structured output and MCP create table/OAuth notice.\n- Refresh Flashduty skill guidance for person_id resolution, rule enumeration limits, incident summary output, and incident export discovery.\n\n## Verification\n- make\n- go test ./internal/cli -run 'TestIncidentShortID|TestIncident|TestRenderGenericTable_McpServer|TestPrintGenericResult' -v\n- built binary black-box HTTP fixture for incident list JSON compact projection and safari mcp-server-create OAuth notice\n

ysyneu added 10 commits July 1, 2026 22:19
…ination

Schedule/oncall output returns `person_id`s — a different id namespace from
`member_id`. The schedule.md card and the `oncall who` help told agents to
resolve names by joining `fduty member list` on `member_id`: wrong namespace,
and it forces a full-roster scan that silently drops people on later pages.
In prod this produced a confidently-incomplete on-call answer (a responder on
page 21 of 22 was missed).

The batch resolver already exists: `fduty person infos <person_id> ...`
(POST /person/infos) returns person_id + person_name in one call. Point the
cards and the oncall help at it, and warn that person_id != member_id.

- schedule.md: replace the member_id-join flow with `person infos`
- member.md: cross-reference `person infos`; add a person_id gotcha
- oncall.go: correct the `oncall who` Long help (table already enriches names;
  use `person infos` for raw ids elsewhere)

Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26):
sess_8fKpUejeyKfHt2hx5XNSsc, sess_UtK6eDMFY4TVZCSAj64gKF.
…id 0`, never via fired alerts

`rule-counter-status` 400s "too many rules" on large accounts, and a guessed
`--folder-id N` 400s "Folder not found". With both paths blocked, a prod agent
fell back to FIRED alerts (`insight top-alerts 90d`) as a proxy for CONFIGURED
rules and produced a confidently-wrong coverage report — marking P0 checks as
"missing" when it had only verified them as not-fired-in-90d.

The enumerate-all path already exists: `rule-list-basic --folder-id 0` returns
every configured rule with no folder id needed. Make it the documented path,
add the two-error fallback, and add a hard CONFIGURED != FIRED warning.

monit.md only (skill card; edits are outside the GENERATED fence).

Note: `rule-counter-status` itself cannot be scoped/paginated from the CLI —
the SDK `ReadCounterStatus(ctx)` and `POST /monit/rule/counter/status` take no
params; making it not 400 on large accounts is backend work, out of scope here.

Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26): sess_DiRDzjygi4NuYyAcsgzB6o.
… list

The first commit on this branch claimed `rule-list-basic --folder-id 0` lists
all rules. That is FALSE — verified against the live API (400 "Folder not
found") and confirmed in monit-webapi: ListRuleBasic -> SafeFolder ->
GetByID(0) -> nil -> 400. `rule-list-basic` also returns only a folder's
DIRECT rules, not its descendants, and there is no account-wide rule list;
`rule-counter-status`/`rule-status` abort with "too many rules" past a server
cap (default 100).

Corrected guidance: enumerate by walking the folder tree (rule-counter-status
-> rule-status -> rule-list-basic per node); past the cap, report the limit
honestly ("cannot fully enumerate configured rules on this account") instead of
fabricating a completeness %. Kept the CONFIGURED != FIRED guardrail. The
generated `--folder-id` help ("0 to list all accessible rules") is a known
SDK/OpenAPI bug, flagged for a backend round.

Surfaced by /audit-ai-sre-sessions (run audit-2026-06-26): sess_DiRDzjygi4NuYyAcsgzB6o.
…it-agent probes

P6 — incident-summary.sh dumped ~80K chars of toon per incident (81042/79075
in the audited sessions), most of it empty boilerplate, overflowing the inline
output cap and forcing 3-4 sequential reads to page it back in. Root cause: the
script appended `--output-format toon` to every command, which takes each verb's
machine-readable branch and marshals the full raw response object (every empty
field on `incident detail`, plus heavy blobs like a change's `labels.steps`).
The DEFAULT (table/summary) renderer of each of these read verbs is a curated
projection of exactly the summary-relevant fields (id/severity/status/title/
channel/timestamps/ai_summary/root_cause/…). Fix: drop `--output-format toon`
from run() so each command uses its lean default renderer; that default IS the
field projection a fault summary needs. Kept all six commands, set -uo pipefail,
and the read-only "print real output" intent.

P8 — monit-agent.md: parallelizing multiple probes against the SAME host hit the
per-target concurrency limit and returned `code=overloaded`, forcing an identical
retry that re-sent the growing context. Added a Gotchas line steering fan-out to
serialize probes per target (batch a host's tools into one `invoke`) and
parallelize only across distinct targets.

Evidence: audit run audit-2026-06-26, sessions sess_TRdrnq5oA345qF66GgbYrY (P6)
and sess_QmoruPkRjrwA2tcHvEbNvT (P8). Surfaced by /audit-ai-sre-sessions.

These two files are kept byte-identical with their embedded copies in fc-safari
(logic/runtime/bootstrap/skills/flashduty/); mirrored there in a linked PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant