Skip to content

feat: echo STT transcripts to thread before agent reply#571

Open
dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
dogzzdogzz:feat/stt-transcript-echo
Open

feat: echo STT transcripts to thread before agent reply#571
dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
dogzzdogzz:feat/stt-transcript-echo

Conversation

@dogzzdogzz
Copy link
Copy Markdown
Contributor

Summary

When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.

  • One thread message per user message: > 🎤 <transcript> per clip.
  • Failure → > 🎤 (transcription failed) line + ⚠️ reaction on the user's original message.
  • Opt-out via [stt] echo_transcript = false (default true, mirrored as stt.echoTranscript in Helm values).

Closes #570.

Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560

Architecture

stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig) is the platform-agnostic helper. Discord (`src/discord.rs`) and Slack (`src/slack.rs`) collect a `Vec` while iterating audio attachments and call the helper before the agent dispatch. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.

Files changed

  • `src/config.rs` — `SttConfig.echo_transcript: bool` (default `true`).
  • `src/stt.rs` — `EchoEntry` enum, `format_echo_message`, `post_echo` with `MockAdapter`-driven tests.
  • `src/discord.rs`, `src/slack.rs` — wire echo into the audio attachment loop, call `post_echo` before `router.handle_message`.
  • `charts/openab/values.yaml`, `charts/openab/templates/configmap.yaml` — expose `echoTranscript` (default `true`, `hasKey` guard preserves the default while distinguishing unset vs. explicit `false`).
  • `docs/stt.md`, `docs/config-reference.md` — document `echo_transcript`.
  • `docs/superpowers/specs/` and `docs/superpowers/plans/` — design spec + TDD-style implementation plan that drove this work.

Test plan

  • `cargo test --bin openab` — 133/133 pass (10 in `stt::tests` cover format, post_echo success, failure, mixed, disabled config, empty entries).
  • `cargo clippy --all-targets -- -D warnings` — clean.
  • `helm lint charts/openab` — clean.
  • `helm template ...` with default values renders `echo_transcript = true`; with `--set agents.kiro.stt.echoTranscript=false` renders `echo_transcript = false`.
  • Manual smoke test: send a voice message in Discord — verify the bot posts `> 🎤 ` before the agent's reply.
  • Manual smoke test: same in Slack.
  • Manual smoke test: simulate STT failure (e.g. revoke API key briefly or attach an unsupported file) — verify the failure line + ⚠️ reaction.

Out of scope / follow-ups

  • LINE / Telegram / Teams via gateway — those need audio plumbing in the gateway protocol first. The helper signature accommodates them when that work lands.
  • Multi-clip ordering: `extra_blocks.insert(0, …)` reverses transcript order in the agent prompt while `echo_entries.push(…)` preserves upload order. Pre-existing in the agent-prompt path; out of scope for this PR.

🤖 Generated with Claude Code

@dogzzdogzz dogzzdogzz requested a review from thepagent as a code owner April 26, 2026 03:18
@github-actions github-actions Bot added the pending-screening PR awaiting automated screening label Apr 26, 2026
@dogzzdogzz dogzzdogzz force-pushed the feat/stt-transcript-echo branch 2 times, most recently from ee10184 to 7f74166 Compare April 26, 2026 03:34
When STT transcribes a voice message, optionally post the transcript back
to the thread (no mentions) before the agent reply so users can verify what
was heard. Default is OFF — opt in via [stt] echo_transcript = true.

- New config: [stt] echo_transcript (default false, opt-in)
- New helper: stt::post_echo with platform-agnostic ChatAdapter handle —
  future LINE/Telegram/Teams adapters get echo for free
- Format: > 🎤 <transcript> per clip, all in one thread message
- Failure: > 🎤 (transcription failed) line + ⚠️ reaction on the user msg
- Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap
- Docs: docs/stt.md and docs/config-reference.md updated

Rebased on top of openabdev#567 (gateway config rendering).

Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.
Copy link
Copy Markdown
Collaborator

@chaodu-agent chaodu-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

超渡法師 Review — PR #571

Verdict: 🔴 NEEDS CHANGES — 1 blocker + 2 suggested changes

四問框架

1. What problem?
When STT transcribes a voice message, users have no way to verify what was heard before the agent replies. Misheard transcripts lead to confusing agent responses.

2. How?
Platform-agnostic stt::post_echo() helper posts > 🎤 <transcript> to the thread before agent dispatch. Discord and Slack wired. Failures show (transcription failed) + ⚠️ reaction. Opt-in via echo_transcript = true. Clean EchoEntry enum + format_echo_message with thorough tests.

3. Alternatives?
Could have been per-adapter, but the shared helper in stt.rs is the right call — future adapters get it for free. Gateway platforms deferred (need audio plumbing first).

4. Best approach?
Yes. Clean, well-tested, minimal footprint. A few issues below.

🔴 BLOCKER

1. Merge conflictmergeable: CONFLICTING, mergeStateStatus: DIRTY. Needs rebase on main.

🔴 SUGGESTED CHANGES

2. PR body says default is true but code says false

PR body: "Opt-out via [stt] echo_transcript = false (default true)"
Code: fn default_echo_transcript() -> bool { false }
Helm: echoTranscript: false
Docs: "Default: false"

The code, Helm, and docs all agree on false (opt-in). The PR body is wrong. Please update the PR description to match the actual behavior.

3. Helm template default inconsistency with Rust default

Helm template else-branch renders false:

{{ if hasKey ($cfg.stt | default dict) "echoTranscript" }}{{ ($cfg.stt).echoTranscript }}{{ else }}false{{ end }}

This is correct for the current false default, but if the Rust default ever changes, the Helm template would diverge. Consider using the same pattern as model and base_url which use | default:

echo_transcript = {{ ($cfg.stt).echoTranscript | default false }}

This is a minor point — the current approach works, but the hasKey guard is more complex than needed for a simple boolean.

🟡 NIT (3 items)
  1. Multi-clip ordering note — PR body mentions extra_blocks.insert(0, …) reverses transcript order in the agent prompt while echo_entries.push(…) preserves upload order. This means the echo shows clips in upload order but the agent sees them reversed. Worth a code comment.

  2. SttConfig needs Clone — Discord handler clones self.stt_config into the spawned task (let stt_cfg = self.stt_config.clone()). Verify SttConfig derives Clone on main — if not, this will fail to compile after rebase.

  3. MockAdapter in tests — Clean mock implementation. Consider extracting to a shared test utility if more modules need it (dispatch tests, gateway tests).

🟢 INFO (5 items)
  • Excellent test coverage: 10 tests covering format, post_echo success/failure/mixed/disabled/empty
  • format_echo_message correctly flattens internal newlines to preserve > quoting
  • Best-effort pattern (log + swallow errors) is correct — echo must never block agent reply
  • Single ⚠️ reaction for multiple failures is the right UX choice
  • Platform-agnostic helper signature accommodates future gateway audio plumbing

Reviewed by: 超渡法師

@chaodu-agent
Copy link
Copy Markdown
Collaborator

🔃 Review — PR #571: feat: echo STT transcripts to thread before agent reply

Verdict: 🟡 Minor changes requested — well-structured PR with one blocking doc issue.

Four-Question Framework

# Question Answer
1 What problem? Voice messages are transcribed and sent to the agent, but users never see the transcript. If STT mishears, the agent replies to garbage and nobody knows why.
2 How? After STT completes, posts > 🎤 <transcript> to the thread before dispatching the prompt to the agent. Platform-agnostic helper stt::post_echo() works for both Discord and Slack.
3 Alternatives? One-message-per-clip (rejected — too spammy). Inline in agent reply (rejected — mixes concerns). Current approach: one batched echo message per user message.
4 Best approach? Yes. Clean separation, opt-in config, no race conditions (echo is awaited before agent dispatch).

🔴 SUGGESTED CHANGES

1. PR description contradicts the code
The PR body says default=true / opt-out in multiple places. But every piece of code says default=false / opt-in:

  • fn default_echo_transcript() -> bool { false }
  • values.yaml: echoTranscript: false
  • configmap.yaml else-branch: false
  • docs/config-reference.md: default column false
  • Tests assert !cfg.echo_transcript

The code is correct (opt-in is the right call). Please update the PR description to match — reviewers and future readers rely on it as source of truth.

🟡 NITs (non-blocking)
  1. No transcript length capformat_echo_message flattens newlines but doesn't truncate. A 5-minute voice memo could exceed Discord's 2000-char limit and cause send_message to fail silently. Consider truncating at ~1900 chars with ellipsis.
  2. Multi-clip orderingextra_blocks.insert(0, …) reverses transcript order in the agent prompt while echo_entries.push(…) preserves upload order. This is pre-existing behavior (out of scope), but a one-line code comment at the insert(0, …) site would help future readers.
  3. stt_cfg cloneSttConfig contains String fields, so .clone() allocates. Low priority since it runs once per message, but wrapping in Arc (like adapter) would make it cheap.
  4. docs/stt.md example — inline comment could be clearer: # opt-in; omit or set false to disable instead of just # default: false (opt-in).
🟢 INFO (good stuff)
  1. Clean architecturestt::post_echo() takes &Arc<dyn ChatAdapter>, making it adapter-agnostic. Future adapters wire in with zero changes to stt.rs.
  2. Excellent test coverage — 10 new unit tests with MockAdapter covering success, failure, mixed, empty, disabled, newline flattening, and full post_echo flow.
  3. Thoughtful failure UX — failed transcriptions show > 🎤 (transcription failed) in echo AND add ⚠️ reaction to the user's message. Two signals.
  4. Correct serde defaults#[serde(default = "default_echo_transcript")] follows existing SttConfig pattern.
  5. Helm hasKey guard — correctly distinguishes "not set" from "explicitly false" for backwards compatibility.
  6. Both Discord and Slack fully wired — not a stub. Same pattern in both adapters.
  7. No race conditionpost_echo is awaited before router.handle_message, guaranteeing echo appears before agent reply.

Review by 超渡法師 🪬

@masami-agent
Copy link
Copy Markdown
Contributor

PR Review: #571

Reviewed commit: 6bc70f6

Summary

  • Problem: Users cannot verify STT accuracy — transcripts go directly to the agent without user visibility.
  • Approach: Platform-agnostic post_echo helper posts a quoted transcript to the thread before the agent reply; opt-in via echo_transcript config.
  • Risk level: Low

Core Assessment

  1. Problem clearly stated: ✅ (linked issue STT: echo transcript to thread before agent reply (Discord + Slack) #570, clear motivation)
  2. Approach appropriate: ✅ (platform-agnostic helper, adapters collect entries and call one function)
  3. Alternatives considered: N/A (straightforward feature, approach is natural)
  4. Best approach for now: ✅ (minimal, opt-in, non-blocking best-effort design)

Findings


🔴 Critical: Helm template default contradicts PR description and code default

What: The PR description says "Opt-out via [stt] echo_transcript = false (default true)" — implying the feature defaults to on. However:

  • src/config.rs: default_echo_transcript() -> bool { false } → defaults to false
  • charts/openab/values.yaml: echoTranscript: false
  • charts/openab/templates/configmap.yaml: the else branch renders false
  • docs/config-reference.md and docs/stt.md: both say default is false

The code and docs are internally consistent (default = false, opt-in), but the PR description is wrong — it says "default true, mirrored as stt.echoTranscript" and "Opt-out via [stt] echo_transcript = false". This will confuse reviewers and future readers.

Where: PR description body, first bullet point.
Why it matters: Misleading PR description can cause incorrect deployment assumptions.
Fix: Update the PR description to say: "Opt-in via [stt] echo_transcript = true (default false)".


🟡 Minor: warn! import added but tracing::warn was already used inline in discord.rs

What: In src/discord.rs, the diff adds warn to the import list (use tracing::{debug, error, info, warn};). The existing code at line 598 already uses tracing::warn!(...) with the full path. After this change, both styles coexist in the same file — the new code uses the short warn! form (via crate::stt::post_echo which uses tracing::warn internally), but the existing tracing::warn! call at line 598 remains with the full path.

Where: src/discord.rs line 19
Why it matters: Style inconsistency within the same file. Not a bug, but slightly untidy.
Fix: Either keep using tracing::warn! everywhere (remove warn from imports) or update the existing tracing::warn! call to use the short form. Minor — not blocking.


🟡 Minor: extra_blocks.insert(0, ...) ordering note from PR description is worth a code comment

What: The PR description mentions a known quirk: extra_blocks.insert(0, …) reverses transcript order in the agent prompt while echo_entries.push(…) preserves upload order. This is explicitly out of scope, but a brief // NOTE: comment in the code would help future contributors understand the asymmetry.

Where: src/discord.rs ~line 524, src/slack.rs ~line 1008
Why it matters: Without the comment, someone might "fix" the ordering inconsistency without understanding it was intentional.
Fix: Add a one-line comment like // NOTE: insert(0) reverses order for agent prompt; echo_entries preserves upload order (intentional).


🟢 Info: Excellent test coverage

The 10 new tests in src/stt.rs cover all meaningful paths: success, failure, mixed, disabled config, empty entries, and newline flattening. The MockAdapter pattern is clean and reusable. The config tests in src/config.rs verify the default and explicit-false cases. Well done.

🟢 Info: Good defensive design choices

  • Best-effort echo (errors logged and swallowed) — never blocks agent reply
  • Single ⚠️ reaction even with multiple failures (idempotent)
  • Newline flattening prevents broken blockquotes
  • hasKey guard in Helm template correctly distinguishes "unset" from "explicit false"

🟢 Info: Clean separation of concerns

The EchoEntry enum + format_echo_message + post_echo trio is well-factored. Platform adapters only need to collect entries and call one function. Future adapters (LINE/Telegram) get it for free once they have audio plumbing.


Review Summary (Traffic Light)

🟢 INFO

  • CI all green (10/10 checks pass)
  • Test coverage is thorough (10 new unit tests + 2 config tests)
  • Platform-agnostic design is clean and extensible
  • Helm template logic is correct

🟡 NIT

  • PR description says default is true but code defaults to false — description needs update
  • Minor import style inconsistency in discord.rs
  • A code comment about the insert(0) ordering asymmetry would help future contributors

🔴 SUGGESTED CHANGES

  • Fix the PR description to accurately reflect the default (false, opt-in) — this is the only blocking item since it creates confusion about the feature's behavior

Verdict

APPROVE (with request to fix PR description)

The code itself is solid, well-tested, and follows existing patterns. The only real issue is the misleading PR description which should be corrected before merge to avoid deployment confusion.

@masami-agent
Copy link
Copy Markdown
Contributor

Re-review: #571 (deeper pass)

Reviewed commit: 6bc70f6

After a more thorough second pass, here are additional findings I missed in the first review:


🔴 Critical: Multi-clip ordering inconsistency is now user-visible

What: When a user uploads multiple audio clips (A, B, C), the ordering diverges:

  • echo_entries.push(...) → echo displays in upload order: A, B, C
  • extra_blocks.insert(0, ...) → agent receives in reverse order: C, B, A

Before this PR, the reversal was invisible to users (only the agent saw it). Now that echo shows the transcripts, users will see > 🎤 A / > 🎤 B / > 🎤 C but the agent may respond as if it heard them in reverse order. This creates a confusing UX when users try to verify STT accuracy against the agent's reply.

Where: src/discord.rs ~line 524, src/slack.rs ~line 1008
Why it matters: The echo's purpose is to let users verify what the agent heard. If the ordering doesn't match what the agent received, the verification is misleading.
Fix options:

  1. (Minimal) Add a comment acknowledging this and document it as a known limitation
  2. (Better) Change echo_entries to also use insert(0, ...) so echo order matches agent order
  3. (Best, but out of scope) Fix the underlying extra_blocks.insert(0) to use push instead, making both match upload order

The PR description already acknowledges this as out-of-scope, which is fair. But I'd suggest at minimum option 2 — make echo match what the agent actually sees, even if both are reversed from upload order. Otherwise the echo defeats its own purpose for multi-clip messages.

Severity reassessment: Downgrading to 🟡 since multi-clip voice messages are rare in practice, and the PR description explicitly calls this out. But it's worth discussing.


🟡 Minor: hasKey guard in Helm template is dead code

What: The configmap template uses:

echo_transcript = {{ if hasKey ($cfg.stt | default dict) "echoTranscript" }}{{ ($cfg.stt).echoTranscript }}{{ else }}false{{ end }}

But values.yaml already defines echoTranscript: false as a default. After Helm's values merge, hasKey will always be true (the key exists in the merged values). The else branch is unreachable.

Where: charts/openab/templates/configmap.yaml
Why it matters: Dead code suggests the author intended different semantics (perhaps distinguishing "unset" from "explicit false"). In practice it works correctly — just unnecessarily complex.
Fix: Simplify to echo_transcript = {{ ($cfg.stt).echoTranscript | default false }} — or keep as-is if the intent is to support users who --set only specific stt fields without inheriting the full values.yaml defaults (edge case with --values override files that don't include echoTranscript).

Actually, on reflection: if a user provides a custom values file with only stt.enabled: true and stt.apiKey: xxx (no echoTranscript key), Helm deep-merge with the chart's values.yaml will still include echoTranscript: false. So the guard is truly dead code. Not blocking though.


🟢 Info: Ownership semantics are correct

Verified that format!("[Voice message transcript]: {transcript}") borrows transcript (via Display), then echo_entries.push(EchoEntry::Success(transcript)) moves it. No use-after-move. The CI pass confirms this, but worth noting the pattern is intentional and correct.

🟢 Info: Platform consistency is good

Discord's post_echo runs inside tokio::spawn (before handle_message), Slack's runs directly in the async function (before router.handle_message). Both achieve the same semantic: echo appears before agent reply. The architectural difference is due to serenity's sync EventHandler trait requiring a spawn.


Updated Verdict

APPROVE — the code is solid. The multi-clip ordering issue is pre-existing and explicitly out-of-scope. The only true blocker from my first review remains: fix the PR description (says default is true, but code defaults to false).

Summary of all findings across both reviews:

  • 🔴 PR description says default true but code is falsemust fix (misleading)
  • 🟡 Multi-clip echo order doesn't match agent order → acknowledged, out of scope
  • 🟡 hasKey guard is dead code → cosmetic, not blocking
  • 🟡 Import style inconsistency → cosmetic
  • 🟢 Tests, design, ownership, platform consistency all good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature needs-rebase p2 Medium — planned work pending-contributor pending-screening PR awaiting automated screening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

STT: echo transcript to thread before agent reply (Discord + Slack)

5 participants