feat: echo STT transcripts to thread before agent reply#571
feat: echo STT transcripts to thread before agent reply#571dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
Conversation
ee10184 to
7f74166
Compare
When STT transcribes a voice message, optionally post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Default is OFF — opt in via [stt] echo_transcript = true. - New config: [stt] echo_transcript (default false, opt-in) - New helper: stt::post_echo with platform-agnostic ChatAdapter handle — future LINE/Telegram/Teams adapters get echo for free - Format: > 🎤 <transcript> per clip, all in one thread message - Failure: > 🎤 (transcription failed) line +⚠️ reaction on the user msg - Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap - Docs: docs/stt.md and docs/config-reference.md updated Rebased on top of openabdev#567 (gateway config rendering). Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.
7f74166 to
6bc70f6
Compare
chaodu-agent
left a comment
There was a problem hiding this comment.
超渡法師 Review — PR #571
Verdict: 🔴 NEEDS CHANGES — 1 blocker + 2 suggested changes
四問框架
1. What problem?
When STT transcribes a voice message, users have no way to verify what was heard before the agent replies. Misheard transcripts lead to confusing agent responses.
2. How?
Platform-agnostic stt::post_echo() helper posts > 🎤 <transcript> to the thread before agent dispatch. Discord and Slack wired. Failures show (transcription failed) + echo_transcript = true. Clean EchoEntry enum + format_echo_message with thorough tests.
3. Alternatives?
Could have been per-adapter, but the shared helper in stt.rs is the right call — future adapters get it for free. Gateway platforms deferred (need audio plumbing first).
4. Best approach?
Yes. Clean, well-tested, minimal footprint. A few issues below.
🔴 BLOCKER
1. Merge conflict — mergeable: CONFLICTING, mergeStateStatus: DIRTY. Needs rebase on main.
🔴 SUGGESTED CHANGES
2. PR body says default is true but code says false
PR body: "Opt-out via [stt] echo_transcript = false (default true)"
Code: fn default_echo_transcript() -> bool { false }
Helm: echoTranscript: false
Docs: "Default: false"
The code, Helm, and docs all agree on false (opt-in). The PR body is wrong. Please update the PR description to match the actual behavior.
3. Helm template default inconsistency with Rust default
Helm template else-branch renders false:
{{ if hasKey ($cfg.stt | default dict) "echoTranscript" }}{{ ($cfg.stt).echoTranscript }}{{ else }}false{{ end }}
This is correct for the current false default, but if the Rust default ever changes, the Helm template would diverge. Consider using the same pattern as model and base_url which use | default:
echo_transcript = {{ ($cfg.stt).echoTranscript | default false }}
This is a minor point — the current approach works, but the hasKey guard is more complex than needed for a simple boolean.
🟡 NIT (3 items)
-
Multi-clip ordering note — PR body mentions
extra_blocks.insert(0, …)reverses transcript order in the agent prompt whileecho_entries.push(…)preserves upload order. This means the echo shows clips in upload order but the agent sees them reversed. Worth a code comment. -
SttConfigneedsClone— Discord handler clonesself.stt_configinto the spawned task (let stt_cfg = self.stt_config.clone()). VerifySttConfigderivesCloneon main — if not, this will fail to compile after rebase. -
MockAdapter in tests — Clean mock implementation. Consider extracting to a shared test utility if more modules need it (dispatch tests, gateway tests).
🟢 INFO (5 items)
- Excellent test coverage: 10 tests covering format, post_echo success/failure/mixed/disabled/empty
format_echo_messagecorrectly flattens internal newlines to preserve>quoting- Best-effort pattern (log + swallow errors) is correct — echo must never block agent reply
- Single
⚠️ reaction for multiple failures is the right UX choice - Platform-agnostic helper signature accommodates future gateway audio plumbing
Reviewed by: 超渡法師
🔃 Review — PR #571: feat: echo STT transcripts to thread before agent replyVerdict: 🟡 Minor changes requested — well-structured PR with one blocking doc issue. Four-Question Framework
🔴 SUGGESTED CHANGES1. PR description contradicts the code
The code is correct (opt-in is the right call). Please update the PR description to match — reviewers and future readers rely on it as source of truth. 🟡 NITs (non-blocking)
🟢 INFO (good stuff)
Review by 超渡法師 🪬 |
PR Review: #571Reviewed commit: Summary
Core Assessment
Findings🔴 Critical: Helm template default contradicts PR description and code defaultWhat: The PR description says "Opt-out via
The code and docs are internally consistent (default = false, opt-in), but the PR description is wrong — it says "default Where: PR description body, first bullet point. 🟡 Minor:
|
Re-review: #571 (deeper pass)Reviewed commit: After a more thorough second pass, here are additional findings I missed in the first review: 🔴 Critical: Multi-clip ordering inconsistency is now user-visibleWhat: When a user uploads multiple audio clips (A, B, C), the ordering diverges:
Before this PR, the reversal was invisible to users (only the agent saw it). Now that echo shows the transcripts, users will see Where:
The PR description already acknowledges this as out-of-scope, which is fair. But I'd suggest at minimum option 2 — make echo match what the agent actually sees, even if both are reversed from upload order. Otherwise the echo defeats its own purpose for multi-clip messages. Severity reassessment: Downgrading to 🟡 since multi-clip voice messages are rare in practice, and the PR description explicitly calls this out. But it's worth discussing. 🟡 Minor:
|
Summary
When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.
> 🎤 <transcript>per clip.> 🎤 (transcription failed)line +[stt] echo_transcript = false(defaulttrue, mirrored asstt.echoTranscriptin Helm values).Closes #570.
Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560
Architecture
stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig)is the platform-agnostic helper. Discord (`src/discord.rs`) and Slack (`src/slack.rs`) collect a `Vec` while iterating audio attachments and call the helper before the agent dispatch. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.Files changed
Test plan
Out of scope / follow-ups
🤖 Generated with Claude Code