From d206bb33f8a2fc35164678f2899a95b0085e70a5 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 27 Apr 2026 10:12:50 -0600 Subject: [PATCH 1/5] feat: add bug-triage skill Adds a Claude skill that reads the project bug triage guide, classifies open requires-triage issues, and files a dated summary issue with recommended priority and area labels for a human to review and apply. --- .claude/skills/bug-triage/SKILL.md | 136 +++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 .claude/skills/bug-triage/SKILL.md diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md new file mode 100644 index 0000000000..20bc7f08a8 --- /dev/null +++ b/.claude/skills/bug-triage/SKILL.md @@ -0,0 +1,136 @@ +--- +name: bug-triage +description: Triage open Comet issues marked `requires-triage` per the project bug triage guide and file a GitHub issue summarizing recommended priority, area labels, and `good first issue` candidates. A human reviews the summary issue and applies the labels. +--- + +Run a bug triage pass for the `apache/datafusion-comet` repository. + +## Overview + +This skill produces a triage report as a new GitHub issue. It does NOT apply +labels, close issues, or comment on the triaged issues. A human reviewer reads +the summary issue, applies the recommended labels, and closes the summary issue +when satisfied. + +The triage criteria come from the project's own guide. Read it before doing any +classification work; do not rely on memory. + +## Step 1: Read the Triage Guide + +Read the canonical guide in this repository: + +``` +docs/source/contributor-guide/bug_triage.md +``` + +Use the priority decision tree, escalation triggers, area labels, and +prioritization principles from that guide. If the guide and this skill ever +disagree, the guide wins. Do not paraphrase the guide; quote the labels and +criteria verbatim when classifying. + +## Step 2: Gather Issues That Need Triage + +Fetch all open issues labeled `requires-triage`: + +```bash +gh issue list \ + --repo apache/datafusion-comet \ + --label requires-triage \ + --state open \ + --limit 200 \ + --json number,title,author,createdAt,labels,body,url +``` + +If the list is empty, stop and tell the user there is nothing to triage. Do not +file an empty summary issue. + +## Step 3: Classify Each Issue + +For each issue, review the title and body and determine: + +1. **Priority label** (exactly one): apply the decision tree from the guide. + - `priority:critical` if it could produce silent wrong results + - `priority:high` for crashes, panics, segfaults, NPEs on supported paths + - `priority:medium` for functional bugs / perf regressions with workarounds + - `priority:low` for test-only, CI flakes, tooling, cosmetic +2. **Area labels** (zero or more): from the area table in the guide + (`area:writer`, `area:shuffle`, `area:aggregation`, `area:scan`, + `area:expressions`, `area:ffi`, `area:ci`) plus the pre-existing area + indicators (`native_datafusion`, `native_iceberg_compat`, `spark 4`, + `spark sql tests`). +3. **`good first issue`**: only if the fix is likely straightforward AND + well-scoped. When in doubt, leave it off. +4. **Escalation note**: if the issue matches an escalation trigger from the + guide (e.g., a `priority:high` crash that may also produce wrong results), + call it out. +5. **Needs more info**: if the report has no reproduction steps, mark it as + needing a follow-up question to the reporter rather than guessing. + +If you cannot confidently classify an issue from its title and body, say so +explicitly in the summary instead of guessing. + +Do NOT edit, label, or comment on the triaged issues. Recommendations only. + +## Step 4: File the Summary Issue + +Compute today's date in `YYYY-MM-DD` form (use the system date, not memory): + +```bash +TRIAGE_DATE=$(date -u +%Y-%m-%d) +``` + +Title: `Bug triage results: ${TRIAGE_DATE}` + +Body: a markdown table or per-issue section listing, for each triaged issue: + +- Issue number and title (linked) +- Recommended priority label +- Recommended area labels +- `good first issue`? (yes/no) +- Escalation / needs-more-info notes +- One-sentence rationale tying the recommendation to the guide + +Include a short header that: + +- States the date, the number of issues triaged, and the count per priority +- Reminds reviewers that nothing has been labeled yet, and asks them to apply + the recommended labels (and remove `requires-triage`) before closing this + summary issue +- Links to `docs/source/contributor-guide/bug_triage.md` + +File the issue with `gh`: + +```bash +gh issue create \ + --repo apache/datafusion-comet \ + --title "Bug triage results: ${TRIAGE_DATE}" \ + --body-file <(cat <<'EOF' +... summary markdown here ... +EOF +) +``` + +Do not add labels to the summary issue itself. The human reviewer decides +whether to label it (and will close it when done). + +After creating the issue, print its URL so the reviewer can find it. + +## Output to the User + +Report back: + +1. Number of `requires-triage` issues processed +2. Count per recommended priority +3. URL of the new summary issue + +Do not paste the full triage table back into the chat; it is in the issue. + +## What This Skill Must Not Do + +- Do not apply labels to the triaged issues +- Do not remove `requires-triage` from any issue +- Do not comment on the triaged issues +- Do not close any issue +- Do not file the summary issue if there were zero `requires-triage` issues +- Do not invent priority or area labels that are not in the guide +- Do not include AI/Claude attribution in the summary issue From 2b536edf529b0d0f4d71ef1065376596c7ec1496 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 27 Apr 2026 10:14:45 -0600 Subject: [PATCH 2/5] feat: have bug-triage skill apply labels directly Reverses the earlier read-only behavior. The skill now applies the recommended priority and area labels and removes requires-triage; the dated summary issue records what was done so a human reviewer can spot-check and close it. --- .claude/skills/bug-triage/SKILL.md | 123 +++++++++++++++++++---------- 1 file changed, 80 insertions(+), 43 deletions(-) diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md index 20bc7f08a8..eefe1934a4 100644 --- a/.claude/skills/bug-triage/SKILL.md +++ b/.claude/skills/bug-triage/SKILL.md @@ -1,16 +1,23 @@ --- name: bug-triage -description: Triage open Comet issues marked `requires-triage` per the project bug triage guide and file a GitHub issue summarizing recommended priority, area labels, and `good first issue` candidates. A human reviews the summary issue and applies the labels. +description: Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels (and `good first issue` where appropriate), removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied. --- Run a bug triage pass for the `apache/datafusion-comet` repository. ## Overview -This skill produces a triage report as a new GitHub issue. It does NOT apply -labels, close issues, or comment on the triaged issues. A human reviewer reads -the summary issue, applies the recommended labels, and closes the summary issue -when satisfied. +This skill triages every open issue carrying the `requires-triage` label. For +each one it: + +1. Decides a priority and area labels using the project's triage guide. +2. Applies those labels via `gh`. +3. Removes the `requires-triage` label. +4. Records the decision (with rationale) in a single dated summary issue. + +A human reviewer reads the summary issue, sanity-checks the calls, and closes +it when satisfied. Any label correction is done by the reviewer directly on the +affected issue. The triage criteria come from the project's own guide. Read it before doing any classification work; do not rely on memory. @@ -42,7 +49,7 @@ gh issue list \ ``` If the list is empty, stop and tell the user there is nothing to triage. Do not -file an empty summary issue. +file an empty summary issue and do not modify any labels. ## Step 3: Classify Each Issue @@ -62,75 +69,105 @@ For each issue, review the title and body and determine: well-scoped. When in doubt, leave it off. 4. **Escalation note**: if the issue matches an escalation trigger from the guide (e.g., a `priority:high` crash that may also produce wrong results), - call it out. -5. **Needs more info**: if the report has no reproduction steps, mark it as - needing a follow-up question to the reporter rather than guessing. + note it in the summary. -If you cannot confidently classify an issue from its title and body, say so -explicitly in the summary instead of guessing. +## Step 4: Skip Issues You Cannot Confidently Classify -Do NOT edit, label, or comment on the triaged issues. Recommendations only. +If an issue lacks reproduction steps or is otherwise too ambiguous to classify +with confidence: -## Step 4: File the Summary Issue +- **Do not** apply a priority label. +- **Do not** remove `requires-triage`. +- **Do not** comment on the issue or ask the reporter for more info from this + skill (that is the human reviewer's call). +- Record it in the summary under a "Skipped — needs more info" section so the + reviewer can follow up. -Compute today's date in `YYYY-MM-DD` form (use the system date, not memory): +Guessing is worse than skipping. + +## Step 5: Apply Labels + +For each issue you classified in Step 3, apply the labels and remove +`requires-triage` in a single `gh` call: ```bash -TRIAGE_DATE=$(date -u +%Y-%m-%d) +gh issue edit \ + --repo apache/datafusion-comet \ + --add-label "priority:high,area:expressions" \ + --remove-label "requires-triage" ``` -Title: `Bug triage results: ${TRIAGE_DATE}` +Notes: -Body: a markdown table or per-issue section listing, for each triaged issue: +- Pass the labels as a single comma-separated string (no spaces around commas). +- Quote labels that contain spaces (e.g., `"spark 4"`). +- Only add labels that already exist in the repo. If a label from the guide is + missing in the repo, skip it for that issue and record a note in the summary + rather than creating new labels. +- Do not comment on the issue. -- Issue number and title (linked) -- Recommended priority label -- Recommended area labels -- `good first issue`? (yes/no) -- Escalation / needs-more-info notes -- One-sentence rationale tying the recommendation to the guide +If `gh issue edit` fails for any issue, leave that issue's `requires-triage` +label intact and record the failure in the summary under a "Failed to label" +section. -Include a short header that: +## Step 6: File the Summary Issue -- States the date, the number of issues triaged, and the count per priority -- Reminds reviewers that nothing has been labeled yet, and asks them to apply - the recommended labels (and remove `requires-triage`) before closing this - summary issue -- Links to `docs/source/contributor-guide/bug_triage.md` +Compute today's date in `YYYY-MM-DD` form (use the system date, not memory): -File the issue with `gh`: +```bash +TRIAGE_DATE=$(date -u +%Y-%m-%d) +``` + +Title: `Bug triage results: ${TRIAGE_DATE}` + +Body: a markdown report with these sections, in this order: + +1. **Header** + - Date, total issues processed, and counts per priority + - Link to `docs/source/contributor-guide/bug_triage.md` + - Note that labels have already been applied; the reviewer should spot-check + and close this issue when satisfied +2. **Triaged** — table with one row per issue: + - Issue (linked, e.g., `#1234`) + - Priority applied + - Area labels applied + - `good first issue`? (yes/no) + - One-sentence rationale tying the call to the guide +3. **Escalations to consider** (omit section if empty) +4. **Skipped — needs more info** (omit if empty) +5. **Failed to label** (omit if empty) — issue number plus the `gh` error + +File the issue with `gh`. Use a temp file for the body to keep quoting sane: ```bash gh issue create \ --repo apache/datafusion-comet \ --title "Bug triage results: ${TRIAGE_DATE}" \ - --body-file <(cat <<'EOF' -... summary markdown here ... -EOF -) + --body-file /tmp/triage-summary-${TRIAGE_DATE}.md ``` -Do not add labels to the summary issue itself. The human reviewer decides -whether to label it (and will close it when done). +Do not add labels to the summary issue itself. -After creating the issue, print its URL so the reviewer can find it. +After creating the issue, print its URL. ## Output to the User Report back: 1. Number of `requires-triage` issues processed -2. Count per recommended priority -3. URL of the new summary issue +2. Counts per priority that were applied +3. Number skipped (needs more info) and number failed +4. URL of the new summary issue Do not paste the full triage table back into the chat; it is in the issue. ## What This Skill Must Not Do -- Do not apply labels to the triaged issues -- Do not remove `requires-triage` from any issue +- Do not invent priority or area labels that are not in the guide +- Do not create new labels in the repo - Do not comment on the triaged issues -- Do not close any issue +- Do not close any triaged issue - Do not file the summary issue if there were zero `requires-triage` issues -- Do not invent priority or area labels that are not in the guide +- Do not re-label issues that were skipped or failed (leave `requires-triage` + in place so they show up in the next pass) - Do not include AI/Claude attribution in the summary issue From 5c291c7cdf70081db1775b4768a220314020b19e Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 4 May 2026 07:13:46 -0600 Subject: [PATCH 3/5] feat: group triage summary by priority and address review feedback Summary issue body now uses one subsection per priority, ordered highest first, with one bullet per issue (title plus linked number) and sub-bullets for area labels, good-first-issue flag, and rationale. Drops the markdown table form. Also folds in PR review feedback: priority:critical now explicitly covers correctness issues and security vulnerabilities, and "perf regressions" is spelled out as "performance regressions". --- .claude/skills/bug-triage/SKILL.md | 44 ++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 12 deletions(-) diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md index eefe1934a4..ae7ab56962 100644 --- a/.claude/skills/bug-triage/SKILL.md +++ b/.claude/skills/bug-triage/SKILL.md @@ -56,9 +56,11 @@ file an empty summary issue and do not modify any labels. For each issue, review the title and body and determine: 1. **Priority label** (exactly one): apply the decision tree from the guide. - - `priority:critical` if it could produce silent wrong results + - `priority:critical` for correctness issues (silent wrong results, data + corruption) and security vulnerabilities - `priority:high` for crashes, panics, segfaults, NPEs on supported paths - - `priority:medium` for functional bugs / perf regressions with workarounds + - `priority:medium` for functional bugs / performance regressions with + workarounds - `priority:low` for test-only, CI flakes, tooling, cosmetic 2. **Area labels** (zero or more): from the area table in the guide (`area:writer`, `area:shuffle`, `area:aggregation`, `area:scan`, @@ -127,15 +129,32 @@ Body: a markdown report with these sections, in this order: - Link to `docs/source/contributor-guide/bug_triage.md` - Note that labels have already been applied; the reviewer should spot-check and close this issue when satisfied -2. **Triaged** — table with one row per issue: - - Issue (linked, e.g., `#1234`) - - Priority applied - - Area labels applied - - `good first issue`? (yes/no) - - One-sentence rationale tying the call to the guide -3. **Escalations to consider** (omit section if empty) -4. **Skipped — needs more info** (omit if empty) -5. **Failed to label** (omit if empty) — issue number plus the `gh` error +2. **Triaged** — one subsection per priority, ordered highest priority first + (`priority:critical`, then `priority:high`, then `priority:medium`, then + `priority:low`). Omit any subsection whose count is zero. Do **not** use a + markdown table anywhere in this section; use nested bullet lists only. + + Within each subsection, one top-level bullet per issue: + + ``` + ### priority:critical + + - ([#1234](https://github.com/apache/datafusion-comet/issues/1234)) + - Area labels: `area:expressions`, `area:scan` + - good first issue: no + - Rationale: one sentence tying the call to the guide + ``` + + The issue number (not the title) is the link target. The title is plain + text. If there are no area labels, write `Area labels: none`. +3. **Escalations to consider** (omit section if empty) — bullet per issue with + the same ` ([#N](url))` form, plus a sub-bullet explaining the + trigger from the guide. +4. **Skipped — needs more info** (omit if empty) — bullet per issue with the + same `<title> ([#N](url))` form, plus a sub-bullet explaining what is + missing. +5. **Failed to label** (omit if empty) — bullet per issue with the same + `<title> ([#N](url))` form, plus a sub-bullet quoting the `gh` error. File the issue with `gh`. Use a temp file for the body to keep quoting sane: @@ -159,7 +178,8 @@ Report back: 3. Number skipped (needs more info) and number failed 4. URL of the new summary issue -Do not paste the full triage table back into the chat; it is in the issue. +Do not paste the full per-issue listing back into the chat; it is in the +summary issue. ## What This Skill Must Not Do From a32770abb0a449f88d5c1b8b8d3bfae3fa832f73 Mon Sep 17 00:00:00 2001 From: Andy Grove <agrove@apache.org> Date: Mon, 4 May 2026 07:14:24 -0600 Subject: [PATCH 4/5] feat: drop good-first-issue handling from bug-triage skill --- .claude/skills/bug-triage/SKILL.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md index ae7ab56962..f13f18ebd9 100644 --- a/.claude/skills/bug-triage/SKILL.md +++ b/.claude/skills/bug-triage/SKILL.md @@ -1,6 +1,6 @@ --- name: bug-triage -description: Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels (and `good first issue` where appropriate), removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied. +description: Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels, removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied. --- Run a bug triage pass for the `apache/datafusion-comet` repository. @@ -67,9 +67,7 @@ For each issue, review the title and body and determine: `area:expressions`, `area:ffi`, `area:ci`) plus the pre-existing area indicators (`native_datafusion`, `native_iceberg_compat`, `spark 4`, `spark sql tests`). -3. **`good first issue`**: only if the fix is likely straightforward AND - well-scoped. When in doubt, leave it off. -4. **Escalation note**: if the issue matches an escalation trigger from the +3. **Escalation note**: if the issue matches an escalation trigger from the guide (e.g., a `priority:high` crash that may also produce wrong results), note it in the summary. @@ -141,7 +139,6 @@ Body: a markdown report with these sections, in this order: - <issue title> ([#1234](https://github.com/apache/datafusion-comet/issues/1234)) - Area labels: `area:expressions`, `area:scan` - - good first issue: no - Rationale: one sentence tying the call to the guide ``` From 03d72845dd0d52a1888cdc949b3197134f2049e5 Mon Sep 17 00:00:00 2001 From: Andy Grove <agrove@apache.org> Date: Mon, 4 May 2026 07:24:10 -0600 Subject: [PATCH 5/5] style: run prettier on bug-triage SKILL.md --- .claude/skills/bug-triage/SKILL.md | 1 + 1 file changed, 1 insertion(+) diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md index f13f18ebd9..3194aab5fe 100644 --- a/.claude/skills/bug-triage/SKILL.md +++ b/.claude/skills/bug-triage/SKILL.md @@ -144,6 +144,7 @@ Body: a markdown report with these sections, in this order: The issue number (not the title) is the link target. The title is plain text. If there are no area labels, write `Area labels: none`. + 3. **Escalations to consider** (omit section if empty) — bullet per issue with the same `<title> ([#N](url))` form, plus a sub-bullet explaining the trigger from the guide.