fix(components): Record audio files incorrectly saved with .jpg extension#6869
fix(components): Record audio files incorrectly saved with .jpg extension#6869BillionClaw wants to merge 1 commit intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug where the bot failed to process voice and audio messages due to incorrect file extension handling. Previously, audio files downloaded from URLs or decoded from base64 were mistakenly saved with a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
download_audio_by_url, consider checkingresp.statusand raising a clear error if it's not 200 before reading the body, so callers can distinguish between network/HTTP failures and empty content. - The base64 branch in
Record.convert_to_file_pathreimplements the temp-file logic that now exists insave_temp_audio; you could reusesave_temp_audiothere for consistency and to avoid duplication. - The final bare
except Exception as e: raise eindownload_audio_by_urlis redundant and could be removed to let exceptions naturally propagate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `download_audio_by_url`, consider checking `resp.status` and raising a clear error if it's not 200 before reading the body, so callers can distinguish between network/HTTP failures and empty content.
- The base64 branch in `Record.convert_to_file_path` reimplements the temp-file logic that now exists in `save_temp_audio`; you could reuse `save_temp_audio` there for consistency and to avoid duplication.
- The final bare `except Exception as e: raise e` in `download_audio_by_url` is redundant and could be removed to let exceptions naturally propagate.
## Individual Comments
### Comment 1
<location path="astrbot/core/utils/io.py" line_range="41-42" />
<code_context>
+ trust_env=True,
+ connector=connector,
+ ) as session:
+ async with session.get(url) as resp:
+ data = await resp.read()
+ return save_temp_audio(data)
+ except (aiohttp.ClientConnectorSSLError, aiohttp.ClientConnectorCertificateError):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider handling non-2xx HTTP responses explicitly before saving the audio
As written, we’ll save the body for any status (including 4xx/5xx or failed redirects) as if it were valid audio. Please either call `resp.raise_for_status()` or check `resp.status` before reading, so we don’t treat error pages/HTML as audio downstream.
Suggested implementation:
```python
async with aiohttp.ClientSession(
trust_env=True,
connector=connector,
) as session:
async with session.get(url) as resp:
resp.raise_for_status()
data = await resp.read()
return save_temp_audio(data)
```
```python
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE
async with aiohttp.ClientSession() as session:
async with session.get(url, ssl=ssl_context) as resp:
resp.raise_for_status()
data = await resp.read()
return save_temp_audio(data)
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/utils/io.py" line_range="44-53" />
<code_context>
+ async with session.get(url) as resp:
+ data = await resp.read()
+ return save_temp_audio(data)
+ except (aiohttp.ClientConnectorSSLError, aiohttp.ClientConnectorCertificateError):
+ logger.warning(
+ f"SSL certificate verification failed for {url}. "
+ "Disabling SSL verification (CERT_NONE) as a fallback."
+ )
+ ssl_context = ssl.create_default_context()
+ ssl_context.check_hostname = False
+ ssl_context.verify_mode = ssl.CERT_NONE
+ async with aiohttp.ClientSession() as session:
+ async with session.get(url, ssl=ssl_context) as resp:
+ data = await resp.read()
+ return save_temp_audio(data)
</code_context>
<issue_to_address>
**🚨 issue (security):** Reconsider or constrain the SSL verification disabled fallback for security-sensitive contexts
This fallback fully disables certificate and hostname verification, which creates significant MITM risk if used with arbitrary URLs. If we truly need this behavior, please gate it behind an explicit config/flag, restrict it to a known host list, or otherwise tightly scope when verification is disabled so it cannot occur in general use by default.
</issue_to_address>
### Comment 3
<location path="astrbot/core/utils/io.py" line_range="52-57" />
<code_context>
+ async with session.get(url, ssl=ssl_context) as resp:
+ data = await resp.read()
+ return save_temp_audio(data)
+ except Exception as e:
+ raise e
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The broad try/except that re-raises the same exception is redundant
Catching `Exception` just to `raise e` can strip traceback context in some Python versions and adds no value here. Remove this `except` and let errors propagate, or only catch specific exceptions when you need logging or custom handling.
```suggestion
async with aiohttp.ClientSession() as session:
async with session.get(url, ssl=ssl_context) as resp:
data = await resp.read()
return save_temp_audio(data)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| except (aiohttp.ClientConnectorSSLError, aiohttp.ClientConnectorCertificateError): | ||
| logger.warning( | ||
| f"SSL certificate verification failed for {url}. " | ||
| "Disabling SSL verification (CERT_NONE) as a fallback." | ||
| ) | ||
| ssl_context = ssl.create_default_context() | ||
| ssl_context.check_hostname = False | ||
| ssl_context.verify_mode = ssl.CERT_NONE | ||
| async with aiohttp.ClientSession() as session: | ||
| async with session.get(url, ssl=ssl_context) as resp: |
There was a problem hiding this comment.
🚨 issue (security): Reconsider or constrain the SSL verification disabled fallback for security-sensitive contexts
This fallback fully disables certificate and hostname verification, which creates significant MITM risk if used with arbitrary URLs. If we truly need this behavior, please gate it behind an explicit config/flag, restrict it to a known host list, or otherwise tightly scope when verification is disabled so it cannot occur in general use by default.
There was a problem hiding this comment.
Code Review
This pull request effectively resolves the issue of audio files being incorrectly saved with a .jpg extension by introducing dedicated functions for handling audio downloads. The changes in astrbot/core/message/components.py and astrbot/core/utils/io.py are logical and directly address the root cause.
I've added a couple of suggestions to improve code consistency and maintainability in the new functions. Specifically, I've recommended reusing the new save_temp_audio helper function to avoid code duplication and suggested a small improvement to exception handling.
astrbot/core/message/components.py
Outdated
| bs64_data = self.file.removeprefix("base64://") | ||
| image_bytes = base64.b64decode(bs64_data) | ||
| audio_bytes = base64.b64decode(bs64_data) | ||
| file_path = os.path.join( | ||
| get_astrbot_temp_path(), f"recordseg_{uuid.uuid4()}.jpg" | ||
| get_astrbot_temp_path(), f"recordseg_{uuid.uuid4()}.audio" | ||
| ) | ||
| with open(file_path, "wb") as f: | ||
| f.write(image_bytes) | ||
| f.write(audio_bytes) | ||
| return os.path.abspath(file_path) |
There was a problem hiding this comment.
To improve code reuse and consistency, you can use the new save_temp_audio function here. This avoids duplicating the logic for saving temporary audio files from base64 strings.
You'll first need to import save_temp_audio from astrbot.core.utils.io at the top of the file.
from astrbot.core.utils.io import ( # noqa: I001
download_audio_by_url,
download_file,
download_image_by_url,
file_to_base64,
save_temp_audio,
)Then, you can simplify this block of code.
bs64_data = self.file.removeprefix("base64://")
audio_bytes = base64.b64decode(bs64_data)
return save_temp_audio(audio_bytes)
astrbot/core/utils/io.py
Outdated
| except Exception as e: | ||
| raise e |
|
Thank you for the review and merge! Glad to help. |
|
Thank you for your comment! I'd be happy to address any questions or make adjustments. Please let me know what changes would be helpful. |
|
Thank you for the review and merge! Glad to help. |
|
Thank you for the review! Let me know if you need any changes. |
884cd10 to
dc341b6
Compare
Review Feedback AddressedThanks for the review. All feedback items have been addressed in the latest commit: 1. HTTP status check (line 42, id:2977827214) - bug_risk 2. SSL fallback warning (line 53, id:2977827225) - security 3. Redundant except block (line 57, id:2977827228) - bug_risk 4. gemini-code-assist redundant except (id:2977828903) |
…sion Fixed audio file extension bug (.jpg → .audio), added download_audio_by_url and save_temp_audio functions. Bug fixes: - download_audio_by_url: Added resp.raise_for_status() before reading body to prevent error pages/HTML from being saved as audio - SSL fallback (CERT_NONE): Added explicit warning log about MITM risk - Removed redundant 'except Exception as e: raise e' that stripped traceback - convert_to_file_path: Use save_temp_audio for base64 decoded audio Fixes AstrBotDevs#6853, Fixes AstrBotDevs#6509
dc341b6 to
cc45e8b
Compare
|
5. gemini-code-assist suggestion (id:2977828898) - use save_temp_audio |
|
Thank you for the update, @BillionClaw! It's great to see that all the feedback, including the suggestion to use |
Bug Description
After upgrading from v4.19.5 to v4.20+, the bot does not respond when users send voice or audio files via C2C (private) messages on QQ Official Bot.
Root Cause
Record.convert_to_file_path()was usingdownload_image_by_url()to download audio files from URLs. This caused:.jpgextension instead of a proper audio extension.audioextension in temp files was also being incorrectly named.jpgThe same bug existed in
convert_to_base64().When the bot receives a voice message from a QQ Official Bot C2C message, the audio URL is extracted and passed to
Record.convert_to_file_path(). If the file extension is incorrect or the download fails, downstream processing (like ASR/transcription) fails silently.Fix
download_audio_by_url()function inastrbot/core/utils/io.pyfor proper audio downloadingsave_temp_audio()function to save audio data with correct.audioextensionRecord.convert_to_file_path()to usedownload_audio_by_url()instead ofdownload_image_by_url().audioextension instead of.jpgRecord.convert_to_base64()to usedownload_audio_by_url()Changes
astrbot/core/utils/io.py: Addedsave_temp_audio()anddownload_audio_by_url()astrbot/core/message/components.py: ChangedRecord.convert_to_file_path()andRecord.convert_to_base64()to use the new audio download functionFixes #6853
Fixes #6509
Summary by Sourcery
Fix handling of audio/voice message records so that downloaded and temporary files use appropriate audio-specific utilities and file extensions instead of image handling.
Bug Fixes:
Enhancements: