Skip to content

feat telegram: add voice message support for telegram with pluggable#38

Open
soufianebouaddis wants to merge 2 commits intojobrunr:mainfrom
soufianebouaddis:feature/add-voice-message-support-for-telegram
Open

feat telegram: add voice message support for telegram with pluggable#38
soufianebouaddis wants to merge 2 commits intojobrunr:mainfrom
soufianebouaddis:feature/add-voice-message-support-for-telegram

Conversation

@soufianebouaddis
Copy link
Copy Markdown

Add support for voice messages in Telegram channel

This PR extends TelegramChannel to handle voice messages in addition to text.

Changes

  • Refactored TelegramChannel.consume() to process both text and voice messages
  • Download voice messages from Telegram API
  • Introduced a pluggable SpeechToTextService abstraction
  • Transcribed text routed through existing agent.respondTo(...) flow

Transcription

  • Default: MockSpeechToTextService (no external dependency, suitable for testing)
  • Optional: OpenAiSpeechToTextService (enabled via speech.provider=openai)

Notes

  • Existing text message behavior remains unchanged
  • Tests updated and all passing

Next Steps / Ideas

  • Integrate real transcription providers (e.g., OpenAI Whisper, Spring AI AudioTranscriptionModel, or local Whisper plugin)
  • Open to feedback on aligning the abstraction with Spring AI’s AudioTranscriptionModel if preferred

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Mar 26, 2026

We require contributors to sign our Contributor License Agreement, and we don't have @soufianebouaddis on file. In order for us to review and merge your code, please create a PR where you add yourself to the contributors of JobRunr. This only needs to be done once. As soon as that is done, we can review your PR.

Thanks a lot!

@rdehuyss
Copy link
Copy Markdown
Contributor

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Mar 29, 2026
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Mar 29, 2026

The cla-bot has been summoned, and re-checked this pull request!

a-simeshin added a commit to a-simeshin/JavaClaw that referenced this pull request Apr 10, 2026
…тестами для audit/executions/deliveries REST API endpoints (T58, T60) с реальным PostgreSQL через Testcontainers — все тесты зелёные, BUILD SUCCESS
a-simeshin added a commit to a-simeshin/JavaClaw that referenced this pull request Apr 11, 2026
…грация role_agent_config, RoleAgentConfig entity/repository/service с hierarchy fallback и кэшем, интеграция model override через весь pipeline (ChatRestController→SseStreamingService→ChatService), REST API endpoints для управления, 26 новых тестов, все 1147 тестов зелёные
@auloin
Copy link
Copy Markdown
Contributor

auloin commented Apr 15, 2026

Hi @soufianebouaddis thanks for submitting this PR. Sorry for the late review.

From what I can see the actual transcription is yet to be done. I think we should have at least one working implementation. Is this something you'd still like to work on?

My second concern is that we're mixing telegram text and voice messages, is it possible to find a nice abstraction?

@soufianebouaddis
Copy link
Copy Markdown
Author

Hi @auloin, thanks for the review and sorry for the incomplete implementation.

I’ll continue working on this and add a concrete transcription provider so the feature works end-to-end. I’ll also revisit the current design to avoid mixing text and voice handling in TelegramChannel and introduce a cleaner abstraction for message types.

I’ll update the PR shortly with these changes. Thanks for the feedback!

@soufianebouaddis
Copy link
Copy Markdown
Author

Hi @auloin, this update extending TelegramChannel.consume() to handle both text and voice inputs through a single flow. Voice messages are downloaded via TelegramVoiceDownloader, transcribed to text using a SpeechToTextService abstraction, and then passed to agent.respondTo() the same way as text messages.

I added working transcription implementations (local via whisper-cli + ffmpeg, and OpenAI), with a mock still available for testing. The flow normalizes everything to text before reaching the agent, so text and voice are no longer mixed beyond the input layer.

@auloin
Copy link
Copy Markdown
Contributor

auloin commented Apr 17, 2026

Thanks @soufianebouaddis. I'll review it as soon as possible. In the meantime could you already pull the main branch into your branch and solve the conflicts?

@soufianebouaddis
Copy link
Copy Markdown
Author

Hi @auloin, thanks for the heads up I’ll pull the latest changes from main into my branch and resolve the conflicts shortly. I’ll push the updated version once everything is clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants