feat: echo STT transcripts to thread before agent reply by dogzzdogzz · Pull Request #571 · openabdev/openab

dogzzdogzz · 2026-04-26T03:17:59Z

Summary

When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.

One thread message per user message: > 🎤 <transcript> per clip.
Failure → > 🎤 (transcription failed) line + ⚠️ reaction on the user's original message.
Opt-out via [stt] echo_transcript = false (default true, mirrored as stt.echoTranscript in Helm values).

Closes #570.

Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560

Architecture

stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig) is the platform-agnostic helper. Discord (`src/discord.rs`) and Slack (`src/slack.rs`) collect a `Vec` while iterating audio attachments and call the helper before the agent dispatch. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.

Files changed

`src/config.rs` — `SttConfig.echo_transcript: bool` (default `true`).
`src/stt.rs` — `EchoEntry` enum, `format_echo_message`, `post_echo` with `MockAdapter`-driven tests.
`src/discord.rs`, `src/slack.rs` — wire echo into the audio attachment loop, call `post_echo` before `router.handle_message`.
`charts/openab/values.yaml`, `charts/openab/templates/configmap.yaml` — expose `echoTranscript` (default `true`, `hasKey` guard preserves the default while distinguishing unset vs. explicit `false`).
`docs/stt.md`, `docs/config-reference.md` — document `echo_transcript`.
`docs/superpowers/specs/` and `docs/superpowers/plans/` — design spec + TDD-style implementation plan that drove this work.

Test plan

`cargo test --bin openab` — 133/133 pass (10 in `stt::tests` cover format, post_echo success, failure, mixed, disabled config, empty entries).
`cargo clippy --all-targets -- -D warnings` — clean.
`helm lint charts/openab` — clean.
`helm template ...` with default values renders `echo_transcript = true`; with `--set agents.kiro.stt.echoTranscript=false` renders `echo_transcript = false`.
Manual smoke test: send a voice message in Discord — verify the bot posts `> 🎤 ` before the agent's reply.
Manual smoke test: same in Slack.
Manual smoke test: simulate STT failure (e.g. revoke API key briefly or attach an unsupported file) — verify the failure line + ⚠️ reaction.

Out of scope / follow-ups

LINE / Telegram / Teams via gateway — those need audio plumbing in the gateway protocol first. The helper signature accommodates them when that work lands.
Multi-clip ordering: `extra_blocks.insert(0, …)` reverses transcript order in the agent prompt while `echo_entries.push(…)` preserves upload order. Pre-existing in the agent-prompt path; out of scope for this PR.

🤖 Generated with Claude Code

When STT transcribes a voice message, optionally post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Default is OFF — opt in via [stt] echo_transcript = true. - New config: [stt] echo_transcript (default false, opt-in) - New helper: stt::post_echo with platform-agnostic ChatAdapter handle — future LINE/Telegram/Teams adapters get echo for free - Format: > 🎤 <transcript> per clip, all in one thread message - Failure: > 🎤 (transcription failed) line + ⚠️ reaction on the user msg - Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap - Docs: docs/stt.md and docs/config-reference.md updated Rebased on top of openabdev#567 (gateway config rendering). Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.

chaodu-agent

超渡法師 Review — PR #571

Verdict: 🔴 NEEDS CHANGES — 1 blocker + 2 suggested changes

四問框架

1. What problem?
When STT transcribes a voice message, users have no way to verify what was heard before the agent replies. Misheard transcripts lead to confusing agent responses.

2. How?
Platform-agnostic stt::post_echo() helper posts > 🎤 <transcript> to the thread before agent dispatch. Discord and Slack wired. Failures show (transcription failed) + ⚠️ reaction. Opt-in via echo_transcript = true. Clean EchoEntry enum + format_echo_message with thorough tests.

3. Alternatives?
Could have been per-adapter, but the shared helper in stt.rs is the right call — future adapters get it for free. Gateway platforms deferred (need audio plumbing first).

4. Best approach?
Yes. Clean, well-tested, minimal footprint. A few issues below.

🔴 BLOCKER

1. Merge conflict — mergeable: CONFLICTING, mergeStateStatus: DIRTY. Needs rebase on main.

🔴 SUGGESTED CHANGES

2. PR body says default is true but code says false

PR body: "Opt-out via [stt] echo_transcript = false (default true)"
Code: fn default_echo_transcript() -> bool { false }
Helm: echoTranscript: false
Docs: "Default: false"

The code, Helm, and docs all agree on false (opt-in). The PR body is wrong. Please update the PR description to match the actual behavior.

3. Helm template default inconsistency with Rust default

Helm template else-branch renders false:

{{ if hasKey ($cfg.stt | default dict) "echoTranscript" }}{{ ($cfg.stt).echoTranscript }}{{ else }}false{{ end }}

This is correct for the current false default, but if the Rust default ever changes, the Helm template would diverge. Consider using the same pattern as model and base_url which use | default:

echo_transcript = {{ ($cfg.stt).echoTranscript | default false }}

This is a minor point — the current approach works, but the hasKey guard is more complex than needed for a simple boolean.

🟡 NIT (3 items)

Multi-clip ordering note — PR body mentions extra_blocks.insert(0, …) reverses transcript order in the agent prompt while echo_entries.push(…) preserves upload order. This means the echo shows clips in upload order but the agent sees them reversed. Worth a code comment.
SttConfig needs Clone — Discord handler clones self.stt_config into the spawned task (let stt_cfg = self.stt_config.clone()). Verify SttConfig derives Clone on main — if not, this will fail to compile after rebase.
MockAdapter in tests — Clean mock implementation. Consider extracting to a shared test utility if more modules need it (dispatch tests, gateway tests).

🟢 INFO (5 items)

Excellent test coverage: 10 tests covering format, post_echo success/failure/mixed/disabled/empty
format_echo_message correctly flattens internal newlines to preserve > quoting
Best-effort pattern (log + swallow errors) is correct — echo must never block agent reply
Single ⚠️ reaction for multiple failures is the right UX choice
Platform-agnostic helper signature accommodates future gateway audio plumbing

Reviewed by: 超渡法師

chaodu-agent · 2026-05-03T12:08:11Z

🔃 Review — PR #571: feat: echo STT transcripts to thread before agent reply

Verdict: 🟡 Minor changes requested — well-structured PR with one blocking doc issue.

Four-Question Framework

#	Question	Answer
1	What problem?	Voice messages are transcribed and sent to the agent, but users never see the transcript. If STT mishears, the agent replies to garbage and nobody knows why.
2	How?	After STT completes, posts `> 🎤 <transcript>` to the thread before dispatching the prompt to the agent. Platform-agnostic helper `stt::post_echo()` works for both Discord and Slack.
3	Alternatives?	One-message-per-clip (rejected — too spammy). Inline in agent reply (rejected — mixes concerns). Current approach: one batched echo message per user message.
4	Best approach?	Yes. Clean separation, opt-in config, no race conditions (echo is `await`ed before agent dispatch).

🔴 SUGGESTED CHANGES

1. PR description contradicts the code
The PR body says default=true / opt-out in multiple places. But every piece of code says default=false / opt-in:

fn default_echo_transcript() -> bool { false }
values.yaml: echoTranscript: false
configmap.yaml else-branch: false
docs/config-reference.md: default column false
Tests assert !cfg.echo_transcript

The code is correct (opt-in is the right call). Please update the PR description to match — reviewers and future readers rely on it as source of truth.

🟡 NITs (non-blocking)

No transcript length cap — format_echo_message flattens newlines but doesn't truncate. A 5-minute voice memo could exceed Discord's 2000-char limit and cause send_message to fail silently. Consider truncating at ~1900 chars with ellipsis.
Multi-clip ordering — extra_blocks.insert(0, …) reverses transcript order in the agent prompt while echo_entries.push(…) preserves upload order. This is pre-existing behavior (out of scope), but a one-line code comment at the insert(0, …) site would help future readers.
stt_cfg clone — SttConfig contains String fields, so .clone() allocates. Low priority since it runs once per message, but wrapping in Arc (like adapter) would make it cheap.
docs/stt.md example — inline comment could be clearer: # opt-in; omit or set false to disable instead of just # default: false (opt-in).

🟢 INFO (good stuff)

Clean architecture — stt::post_echo() takes &Arc<dyn ChatAdapter>, making it adapter-agnostic. Future adapters wire in with zero changes to stt.rs.
Excellent test coverage — 10 new unit tests with MockAdapter covering success, failure, mixed, empty, disabled, newline flattening, and full post_echo flow.
Thoughtful failure UX — failed transcriptions show > 🎤 (transcription failed) in echo AND add ⚠️ reaction to the user's message. Two signals.
Correct serde defaults — #[serde(default = "default_echo_transcript")] follows existing SttConfig pattern.
Helm hasKey guard — correctly distinguishes "not set" from "explicitly false" for backwards compatibility.
Both Discord and Slack fully wired — not a stub. Same pattern in both adapters.
No race condition — post_echo is awaited before router.handle_message, guaranteeing echo appears before agent reply.

Review by 超渡法師 🪬

masami-agent · 2026-05-05T13:31:45Z

PR Review: #571

Reviewed commit: 6bc70f6

Summary

Problem: Users cannot verify STT accuracy — transcripts go directly to the agent without user visibility.
Approach: Platform-agnostic post_echo helper posts a quoted transcript to the thread before the agent reply; opt-in via echo_transcript config.
Risk level: Low

Core Assessment

Problem clearly stated: ✅ (linked issue STT: echo transcript to thread before agent reply (Discord + Slack) #570, clear motivation)
Approach appropriate: ✅ (platform-agnostic helper, adapters collect entries and call one function)
Alternatives considered: N/A (straightforward feature, approach is natural)
Best approach for now: ✅ (minimal, opt-in, non-blocking best-effort design)

Findings

🔴 Critical: Helm template default contradicts PR description and code default

What: The PR description says "Opt-out via [stt] echo_transcript = false (default true)" — implying the feature defaults to on. However:

src/config.rs: default_echo_transcript() -> bool { false } → defaults to false
charts/openab/values.yaml: echoTranscript: false
charts/openab/templates/configmap.yaml: the else branch renders false
docs/config-reference.md and docs/stt.md: both say default is false

The code and docs are internally consistent (default = false, opt-in), but the PR description is wrong — it says "default true, mirrored as stt.echoTranscript" and "Opt-out via [stt] echo_transcript = false". This will confuse reviewers and future readers.

Where: PR description body, first bullet point.
Why it matters: Misleading PR description can cause incorrect deployment assumptions.
Fix: Update the PR description to say: "Opt-in via [stt] echo_transcript = true (default false)".

🟡 Minor: `warn!` import added but `tracing::warn` was already used inline in discord.rs

What: In src/discord.rs, the diff adds warn to the import list (use tracing::{debug, error, info, warn};). The existing code at line 598 already uses tracing::warn!(...) with the full path. After this change, both styles coexist in the same file — the new code uses the short warn! form (via crate::stt::post_echo which uses tracing::warn internally), but the existing tracing::warn! call at line 598 remains with the full path.

Where: src/discord.rs line 19
Why it matters: Style inconsistency within the same file. Not a bug, but slightly untidy.
Fix: Either keep using tracing::warn! everywhere (remove warn from imports) or update the existing tracing::warn! call to use the short form. Minor — not blocking.

🟡 Minor: `extra_blocks.insert(0, ...)` ordering note from PR description is worth a code comment

What: The PR description mentions a known quirk: extra_blocks.insert(0, …) reverses transcript order in the agent prompt while echo_entries.push(…) preserves upload order. This is explicitly out of scope, but a brief // NOTE: comment in the code would help future contributors understand the asymmetry.

Where: src/discord.rs ~line 524, src/slack.rs ~line 1008
Why it matters: Without the comment, someone might "fix" the ordering inconsistency without understanding it was intentional.
Fix: Add a one-line comment like // NOTE: insert(0) reverses order for agent prompt; echo_entries preserves upload order (intentional).

🟢 Info: Excellent test coverage

The 10 new tests in src/stt.rs cover all meaningful paths: success, failure, mixed, disabled config, empty entries, and newline flattening. The MockAdapter pattern is clean and reusable. The config tests in src/config.rs verify the default and explicit-false cases. Well done.

🟢 Info: Good defensive design choices

Best-effort echo (errors logged and swallowed) — never blocks agent reply
Single ⚠️ reaction even with multiple failures (idempotent)
Newline flattening prevents broken blockquotes
hasKey guard in Helm template correctly distinguishes "unset" from "explicit false"

🟢 Info: Clean separation of concerns

The EchoEntry enum + format_echo_message + post_echo trio is well-factored. Platform adapters only need to collect entries and call one function. Future adapters (LINE/Telegram) get it for free once they have audio plumbing.

Review Summary (Traffic Light)

🟢 INFO

CI all green (10/10 checks pass)
Test coverage is thorough (10 new unit tests + 2 config tests)
Platform-agnostic design is clean and extensible
Helm template logic is correct

🟡 NIT

PR description says default is true but code defaults to false — description needs update
Minor import style inconsistency in discord.rs
A code comment about the insert(0) ordering asymmetry would help future contributors

🔴 SUGGESTED CHANGES

Fix the PR description to accurately reflect the default (false, opt-in) — this is the only blocking item since it creates confusion about the feature's behavior

Verdict

APPROVE (with request to fix PR description)

The code itself is solid, well-tested, and follows existing patterns. The only real issue is the misleading PR description which should be corrected before merge to avoid deployment confusion.

masami-agent · 2026-05-05T13:39:02Z

Re-review: #571 (deeper pass)

Reviewed commit: 6bc70f6

After a more thorough second pass, here are additional findings I missed in the first review:

🔴 Critical: Multi-clip ordering inconsistency is now user-visible

What: When a user uploads multiple audio clips (A, B, C), the ordering diverges:

echo_entries.push(...) → echo displays in upload order: A, B, C
extra_blocks.insert(0, ...) → agent receives in reverse order: C, B, A

Before this PR, the reversal was invisible to users (only the agent saw it). Now that echo shows the transcripts, users will see > 🎤 A / > 🎤 B / > 🎤 C but the agent may respond as if it heard them in reverse order. This creates a confusing UX when users try to verify STT accuracy against the agent's reply.

Where: src/discord.rs ~line 524, src/slack.rs ~line 1008
Why it matters: The echo's purpose is to let users verify what the agent heard. If the ordering doesn't match what the agent received, the verification is misleading.
Fix options:

(Minimal) Add a comment acknowledging this and document it as a known limitation
(Better) Change echo_entries to also use insert(0, ...) so echo order matches agent order
(Best, but out of scope) Fix the underlying extra_blocks.insert(0) to use push instead, making both match upload order

The PR description already acknowledges this as out-of-scope, which is fair. But I'd suggest at minimum option 2 — make echo match what the agent actually sees, even if both are reversed from upload order. Otherwise the echo defeats its own purpose for multi-clip messages.

Severity reassessment: Downgrading to 🟡 since multi-clip voice messages are rare in practice, and the PR description explicitly calls this out. But it's worth discussing.

🟡 Minor: `hasKey` guard in Helm template is dead code

What: The configmap template uses:

echo_transcript = {{ if hasKey ($cfg.stt | default dict) "echoTranscript" }}{{ ($cfg.stt).echoTranscript }}{{ else }}false{{ end }}

But values.yaml already defines echoTranscript: false as a default. After Helm's values merge, hasKey will always be true (the key exists in the merged values). The else branch is unreachable.

Where: charts/openab/templates/configmap.yaml
Why it matters: Dead code suggests the author intended different semantics (perhaps distinguishing "unset" from "explicit false"). In practice it works correctly — just unnecessarily complex.
Fix: Simplify to echo_transcript = {{ ($cfg.stt).echoTranscript | default false }} — or keep as-is if the intent is to support users who --set only specific stt fields without inheriting the full values.yaml defaults (edge case with --values override files that don't include echoTranscript).

Actually, on reflection: if a user provides a custom values file with only stt.enabled: true and stt.apiKey: xxx (no echoTranscript key), Helm deep-merge with the chart's values.yaml will still include echoTranscript: false. So the guard is truly dead code. Not blocking though.

🟢 Info: Ownership semantics are correct

Verified that format!("[Voice message transcript]: {transcript}") borrows transcript (via Display), then echo_entries.push(EchoEntry::Success(transcript)) moves it. No use-after-move. The CI pass confirms this, but worth noting the pattern is intentional and correct.

🟢 Info: Platform consistency is good

Discord's post_echo runs inside tokio::spawn (before handle_message), Slack's runs directly in the async function (before router.handle_message). Both achieve the same semantic: echo appears before agent reply. The architectural difference is due to serenity's sync EventHandler trait requiring a spawn.

Updated Verdict

APPROVE — the code is solid. The multi-clip ordering issue is pre-existing and explicitly out-of-scope. The only true blocker from my first review remains: fix the PR description (says default is true, but code defaults to false).

Summary of all findings across both reviews:

🔴 PR description says default true but code is false → must fix (misleading)
🟡 Multi-clip echo order doesn't match agent order → acknowledged, out of scope
🟡 hasKey guard is dead code → cosmetic, not blocking
🟡 Import style inconsistency → cosmetic
🟢 Tests, design, ownership, platform consistency all good

dogzzdogzz requested a review from thepagent as a code owner April 26, 2026 03:18

github-actions Bot added the pending-screening PR awaiting automated screening label Apr 26, 2026

dogzzdogzz force-pushed the feat/stt-transcript-echo branch 2 times, most recently from ee10184 to 7f74166 Compare April 26, 2026 03:34

dogzzdogzz force-pushed the feat/stt-transcript-echo branch from 7f74166 to 6bc70f6 Compare April 26, 2026 03:42

CHC-Agent mentioned this pull request Apr 27, 2026

STT: echo transcript to thread before agent reply (Discord + Slack) #570

Open

thepagent added p2 Medium — planned work feature labels Apr 28, 2026

github-actions Bot added needs-rebase pending-contributor and removed needs-rebase pending-contributor labels Apr 30, 2026

github-actions Bot added pending-maintainer needs-rebase pending-contributor and removed needs-rebase pending-contributor labels May 3, 2026

chaodu-agent requested changes May 3, 2026

View reviewed changes

chaodu-agent removed the pending-maintainer label May 3, 2026

github-actions Bot added pending-maintainer and removed needs-rebase pending-contributor labels May 3, 2026

chaodu-agent added pending-contributor and removed pending-maintainer labels May 3, 2026

github-actions Bot added needs-rebase pending-contributor and removed needs-rebase pending-contributor labels May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: echo STT transcripts to thread before agent reply#571

feat: echo STT transcripts to thread before agent reply#571
dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
dogzzdogzz:feat/stt-transcript-echo

dogzzdogzz commented Apr 26, 2026

Uh oh!

chaodu-agent left a comment

Uh oh!

chaodu-agent commented May 3, 2026

Uh oh!

masami-agent commented May 5, 2026

Uh oh!

masami-agent commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

dogzzdogzz commented Apr 26, 2026

Summary

Architecture

Files changed

Test plan

Out of scope / follow-ups

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

超渡法師 Review — PR #571

四問框架

🔴 BLOCKER

🔴 SUGGESTED CHANGES

Uh oh!

chaodu-agent commented May 3, 2026

🔃 Review — PR #571: feat: echo STT transcripts to thread before agent reply

Four-Question Framework

🔴 SUGGESTED CHANGES

Uh oh!

masami-agent commented May 5, 2026

PR Review: #571

Summary

Core Assessment

Findings

🔴 Critical: Helm template default contradicts PR description and code default

🟡 Minor: warn! import added but tracing::warn was already used inline in discord.rs

🟡 Minor: extra_blocks.insert(0, ...) ordering note from PR description is worth a code comment

🟢 Info: Excellent test coverage

🟢 Info: Good defensive design choices

🟢 Info: Clean separation of concerns

Review Summary (Traffic Light)

Verdict

Uh oh!

masami-agent commented May 5, 2026

Re-review: #571 (deeper pass)

🔴 Critical: Multi-clip ordering inconsistency is now user-visible

🟡 Minor: hasKey guard in Helm template is dead code

🟢 Info: Ownership semantics are correct

🟢 Info: Platform consistency is good

Updated Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

🟡 Minor: `warn!` import added but `tracing::warn` was already used inline in discord.rs

🟡 Minor: `extra_blocks.insert(0, ...)` ordering note from PR description is worth a code comment

🟡 Minor: `hasKey` guard in Helm template is dead code