feat telegram: add voice message support for telegram with pluggable#38
Conversation
|
We require contributors to sign our Contributor License Agreement, and we don't have @soufianebouaddis on file. In order for us to review and merge your code, please create a PR where you add yourself to the contributors of JobRunr. This only needs to be done once. As soon as that is done, we can review your PR. Thanks a lot! |
|
@cla-bot check |
|
The cla-bot has been summoned, and re-checked this pull request! |
…тестами для audit/executions/deliveries REST API endpoints (T58, T60) с реальным PostgreSQL через Testcontainers — все тесты зелёные, BUILD SUCCESS
…грация role_agent_config, RoleAgentConfig entity/repository/service с hierarchy fallback и кэшем, интеграция model override через весь pipeline (ChatRestController→SseStreamingService→ChatService), REST API endpoints для управления, 26 новых тестов, все 1147 тестов зелёные
|
Hi @soufianebouaddis thanks for submitting this PR. Sorry for the late review. From what I can see the actual transcription is yet to be done. I think we should have at least one working implementation. Is this something you'd still like to work on? My second concern is that we're mixing telegram text and voice messages, is it possible to find a nice abstraction? |
|
Hi @auloin, thanks for the review and sorry for the incomplete implementation. I’ll continue working on this and add a concrete transcription provider so the feature works end-to-end. I’ll also revisit the current design to avoid mixing text and voice handling in TelegramChannel and introduce a cleaner abstraction for message types. I’ll update the PR shortly with these changes. Thanks for the feedback! |
|
Hi @auloin, this update extending TelegramChannel.consume() to handle both text and voice inputs through a single flow. Voice messages are downloaded via TelegramVoiceDownloader, transcribed to text using a SpeechToTextService abstraction, and then passed to agent.respondTo() the same way as text messages. I added working transcription implementations (local via whisper-cli + ffmpeg, and OpenAI), with a mock still available for testing. The flow normalizes everything to text before reaching the agent, so text and voice are no longer mixed beyond the input layer. |
|
Thanks @soufianebouaddis. I'll review it as soon as possible. In the meantime could you already pull the main branch into your branch and solve the conflicts? |
|
Hi @auloin, thanks for the heads up I’ll pull the latest changes from |
Add support for voice messages in Telegram channel
This PR extends
TelegramChannelto handle voice messages in addition to text.Changes
TelegramChannel.consume()to process both text and voice messagesSpeechToTextServiceabstractionagent.respondTo(...)flowTranscription
MockSpeechToTextService(no external dependency, suitable for testing)OpenAiSpeechToTextService(enabled viaspeech.provider=openai)Notes
Next Steps / Ideas
AudioTranscriptionModel, or local Whisper plugin)AudioTranscriptionModelif preferred