feat: lark platform auto-create thread and thread-aware context isolation#6716
feat: lark platform auto-create thread and thread-aware context isolation#6716Nick2781 wants to merge 1 commit intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Lark/Feishu platform adapter by introducing comprehensive thread support. It enables the bot to automatically create new threads when replying to group messages, ensures that conversation contexts are correctly isolated per thread using unique session IDs, and allows streaming cards to function seamlessly within these new thread environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
is_in_threadflag onLarkMessageEventis effectively used as a "should_reply_in_thread" switch (it’s set tonot bool(message.thread_id)and then passed straight intoreply_in_thread), which is confusingly named and contradicts the comment; consider renaming it and/or correcting the logic to clearly distinguish "message is in thread" from "we need to set reply_in_thread to create a thread". - In
convert_msg, the_is_in_threadcalculation is based only onmessage.thread_id; if Feishu ever sendsroot_idwithoutthread_idfor certain threaded cases, this heuristic may misbehave—consider explicitly handling/clarifying the relationship betweenthread_idandroot_idhere to avoid edge-case routing issues.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `is_in_thread` flag on `LarkMessageEvent` is effectively used as a "should_reply_in_thread" switch (it’s set to `not bool(message.thread_id)` and then passed straight into `reply_in_thread`), which is confusingly named and contradicts the comment; consider renaming it and/or correcting the logic to clearly distinguish "message is in thread" from "we need to set reply_in_thread to create a thread".
- In `convert_msg`, the `_is_in_thread` calculation is based only on `message.thread_id`; if Feishu ever sends `root_id` without `thread_id` for certain threaded cases, this heuristic may misbehave—consider explicitly handling/clarifying the relationship between `thread_id` and `root_id` here to avoid edge-case routing issues.
## Individual Comments
### Comment 1
<location path="astrbot/core/platform/sources/lark/lark_adapter.py" line_range="610-614" />
<code_context>
+ # 判断消息是否在话题/回复链中
+ # 没有已存在的 thread 时,需要 reply_in_thread 创建话题
+ # 已在话题中的消息回复自然在话题内,无需 reply_in_thread
+ _is_in_thread = not bool(message.thread_id)
+ await self.handle_msg(abm, is_in_thread=_is_in_thread)
- async def handle_msg(self, abm: AstrBotMessage) -> None:
</code_context>
<issue_to_address>
**suggestion:** The `is_in_thread` flag name and its semantics are inverted, which is likely to cause confusion and future misuse.
Here `_is_in_thread` is `not bool(message.thread_id)`, so it is `True` when there is *no* existing thread and `False` when the message is already in a thread, yet it is named `is_in_thread` and then used as `reply_in_thread`. The behavior matches the comment, but the name suggests the opposite state and is likely to be misunderstood later.
Consider either:
- Renaming the argument to something like `should_reply_in_thread` / `create_thread_on_reply`, or
- Flipping the boolean so `is_in_thread == bool(message.thread_id)` and using a separate local flag for `reply_in_thread`.
This will make the code easier to reason about and safer to reuse.
```suggestion
# 判断消息是否在话题/回复链中
# 存在 thread_id 表示消息已在话题/回复链内
is_in_thread = bool(message.thread_id)
await self.handle_msg(abm, is_in_thread=is_in_thread)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces thread-aware functionality for the Lark/Feishu platform adapter, enabling automatic thread creation for replies to non-thread group messages and isolating conversation context per thread. It also ensures streaming cards respect the reply_in_thread flag. The changes are well-structured and address the stated objectives. However, there are a few areas where clarity and consistency could be improved, particularly in session ID parsing and variable naming, to enhance maintainability and prevent potential misinterpretations.
|
@sourcery-ai review it |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The session ID encoding/decoding logic (
%thread%,%root%) is currently duplicated and slightly inconsistent betweenconvert_msgandsend_by_session; consider centralizing this into helper functions to avoid subtle split bugs and make future changes to the format safer. - The
_should_reply_in_thread = not bool(message.thread_id)condition applies to all messages, including DMs and non-group contexts; if thread auto-creation is only desired for group messages, it would be clearer to gate this onMessageType.GROUP_MESSAGE(and/orroot_id) to avoid unexpectedreply_in_thread=Truein other scenarios. - The log line in
send_streamingsays非话题消息,将通过 reply_in_thread=True 创建新话题but is keyed offself.should_reply_in_thread, which may also be true in non-group contexts; consider tightening the condition or adjusting the message so logs accurately describe the scenario.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The session ID encoding/decoding logic (`%thread%`, `%root%`) is currently duplicated and slightly inconsistent between `convert_msg` and `send_by_session`; consider centralizing this into helper functions to avoid subtle split bugs and make future changes to the format safer.
- The `_should_reply_in_thread = not bool(message.thread_id)` condition applies to all messages, including DMs and non-group contexts; if thread auto-creation is only desired for group messages, it would be clearer to gate this on `MessageType.GROUP_MESSAGE` (and/or `root_id`) to avoid unexpected `reply_in_thread=True` in other scenarios.
- The log line in `send_streaming` says `非话题消息,将通过 reply_in_thread=True 创建新话题` but is keyed off `self.should_reply_in_thread`, which may also be true in non-group contexts; consider tightening the condition or adjusting the message so logs accurately describe the scenario.
## Individual Comments
### Comment 1
<location path="astrbot/core/platform/sources/lark/lark_event.py" line_range="781-783" />
<code_context>
使用解耦发送循环,LLM token 到达时只更新 buffer 并唤醒发送协程,
发送频率由网络 RTT 自然限流。
"""
+ # 话题内消息也尝试流式卡片 + reply_in_thread
+ if self.should_reply_in_thread:
+ logger.info("[Lark] 非话题消息,将通过 reply_in_thread=True 创建新话题")
+
# Step 1: 创建流式卡片实体
</code_context>
<issue_to_address>
**issue:** The new comment and log message conflict with the actual `should_reply_in_thread` semantics and may cause confusion.
Here `should_reply_in_thread` is defined as `not bool(message.thread_id)`, so it is `True` only for non-thread messages and `False` for messages already in a thread. In this method, the inline comment mentions "话题内消息" while the log line correctly refers to "非话题消息" in line with the condition. Please align the comment and log message with the actual semantics (e.g., clarify that we create a new topic for non-thread messages, while existing topic messages rely on the thread context and don’t need `reply_in_thread=True`).
</issue_to_address>
### Comment 2
<location path="astrbot/core/platform/sources/lark/lark_adapter.py" line_range="470-481" />
<code_context>
id_type = "chat_id"
receive_id = session.session_id
+ # 从 session_id 中提取真实的 chat_id(去除 %thread% / %root% 等后缀)
if "%" in receive_id:
- receive_id = receive_id.split("%")[1]
+ receive_id = receive_id.split("%")[0]
else:
id_type = "open_id"
</code_context>
<issue_to_address>
**suggestion:** Splitting on any '%' in `session_id` assumes Lark IDs never contain '%', which could be brittle; consider splitting on the explicit suffix markers instead.
Both here and in the single-chat branch, `session_id` is normalized by splitting on any `%` and taking index 0. This relies on `chat_id` / `open_id` never containing `%`, so if the upstream ID format or appended metadata ever changes, you risk truncating valid IDs. To make this resilient and clearer, strip only the known suffixes (e.g. `receive_id.split('%thread%')[0].split('%root%')[0]` or similar logic using exact markers) instead of splitting on a bare `%`.
```suggestion
if session.message_type == MessageType.GROUP_MESSAGE:
id_type = "chat_id"
receive_id = session.session_id
# 从 session_id 中提取真实的 chat_id,仅去除已知后缀(%thread% / %root%),避免误截断包含 '%' 的合法 ID
for suffix in ("%thread%", "%root%"):
if suffix in receive_id:
receive_id = receive_id.split(suffix)[0]
else:
id_type = "open_id"
receive_id = session.session_id
# 单聊也需要去除已知后缀(%thread% / %root%),但不对任意 '%' 做截断
for suffix in ("%thread%", "%root%"):
if suffix in receive_id:
receive_id = receive_id.split(suffix)[0]
```
</issue_to_address>
### Comment 3
<location path="astrbot/core/platform/sources/lark/lark_adapter.py" line_range="474" />
<code_context>
id_type = "chat_id"
receive_id = session.session_id
+ # 从 session_id 中提取真实的 chat_id(去除 %thread% / %root% 等后缀)
if "%" in receive_id:
- receive_id = receive_id.split("%")[1]
+ receive_id = receive_id.split("%")[0]
</code_context>
<issue_to_address>
**issue (complexity):** Consider introducing helper functions to build and parse `session_id` and to derive `should_reply_in_thread` so that callers are agnostic of the `%thread%` / `%root%` encoding details.
You can reduce the new complexity by centralizing the `%thread%` / `%root%` encoding/decoding and making callers agnostic of the exact string format.
### 1. Centralize session_id build/parse
Instead of open‑coding `f"{base_id}%thread%{...}"` and `split("%")[0]` in multiple places, add small helpers:
```python
SESSION_THREAD_SUFFIX = "%thread%"
SESSION_ROOT_SUFFIX = "%root%"
def build_session_id(base_id: str, *, thread_id: str | None, root_id: str | None) -> str:
if thread_id:
return f"{base_id}{SESSION_THREAD_SUFFIX}{thread_id}"
if root_id:
return f"{base_id}{SESSION_ROOT_SUFFIX}{root_id}"
return base_id
def extract_base_session_id(session_id: str) -> str:
# We only care about the "base" right now
for marker in (SESSION_THREAD_SUFFIX, SESSION_ROOT_SUFFIX):
idx = session_id.find(marker)
if idx != -1:
return session_id[:idx]
return session_id
```
Then update the call sites:
```python
# in convert_msg / webhook handling
if abm.type == MessageType.GROUP_MESSAGE:
base_id = abm.group_id or ""
else:
base_id = abm.sender.user_id
abm.session_id = build_session_id(
base_id=base_id,
thread_id=message.thread_id,
root_id=message.root_id,
)
```
```python
# in send_by_session
if session.message_type == MessageType.GROUP_MESSAGE:
id_type = "chat_id"
else:
id_type = "open_id"
receive_id = extract_base_session_id(session.session_id)
```
This removes the “magic” `%`‑based splitting from your business logic and keeps all existing behavior.
### 2. Keep the boolean, but avoid coupling it to the string format
The `should_reply_in_thread` flag is a separate concern from how you encode `session_id`. Keeping it as a boolean is fine, but the current code implicitly relies on `thread_id` and `%thread%` semantics at different layers.
You can at least make the intent explicit and local:
```python
_should_reply_in_thread = message.thread_id is None
await self.handle_msg(abm, should_reply_in_thread=_should_reply_in_thread)
```
Combined with the helper above, callers no longer need to know or care that threads/root are represented via `%thread%` / `%root%` in the `session_id` string, which makes the control flow easier to follow and change later.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
4ec821e to
b317b0e
Compare
|
@sourcery-ai review it |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
{chat_id}%thread%{thread_id}/{chat_id}%root%{root_id}session_id format and its parsing logic are duplicated in multiple places; consider centralizing construction and parsing into shared helpers to avoid future drift or subtle inconsistencies. - In
send_by_session, stripping%thread%/%root%fromreceive_idwithsplit(_suffix)[0]will remove everything after the first occurrence anywhere in the string; usingrsplitor checking that the suffix appears at the end would make the parsing more robust and less prone to edge cases.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `{chat_id}%thread%{thread_id}` / `{chat_id}%root%{root_id}` session_id format and its parsing logic are duplicated in multiple places; consider centralizing construction and parsing into shared helpers to avoid future drift or subtle inconsistencies.
- In `send_by_session`, stripping `%thread%` / `%root%` from `receive_id` with `split(_suffix)[0]` will remove everything after the first occurrence anywhere in the string; using `rsplit` or checking that the suffix appears at the end would make the parsing more robust and less prone to edge cases.
## Individual Comments
### Comment 1
<location path="astrbot/core/platform/sources/lark/lark_adapter.py" line_range="473-477" />
<code_context>
receive_id = session.session_id
- if "%" in receive_id:
- receive_id = receive_id.split("%")[1]
+ # 从 session_id 中提取真实的 chat_id,仅去除已知后缀(%thread% / %root%)
+ for _suffix in ("%thread%", "%root%"):
+ if _suffix in receive_id:
+ receive_id = receive_id.split(_suffix)[0]
+ break
else:
id_type = "open_id"
</code_context>
<issue_to_address>
**suggestion:** Consider extracting the suffix-stripping logic into a shared helper to avoid duplication and keep session-id conventions centralized.
This suffix-stripping logic now appears in multiple places for both group and private messages. A small helper like `_strip_session_suffix(session_id: str) -> str` (or a static method on `LarkMessageEvent`/the adapter) would remove the duplication and keep the list of valid suffixes centralized for easier future extension.
Suggested implementation:
```python
def _strip_session_suffix(session_id: str) -> str:
"""从 session_id 中提取真实的会话 id,仅去除已知后缀(%thread% / %root%)"""
for _suffix in ("%thread%", "%root%"):
if _suffix in session_id:
return session_id.split(_suffix)[0]
return session_id
if session.message_type == MessageType.GROUP_MESSAGE:
id_type = "chat_id"
receive_id = _strip_session_suffix(session.session_id)
else:
id_type = "open_id"
receive_id = _strip_session_suffix(session.session_id)
# 复用 LarkMessageEvent 中的通用发送逻辑
await LarkMessageEvent.send_message_chain(
user_id=event.event.sender.sender_id.open_id,
```
If similar suffix-stripping loops exist elsewhere in `lark_adapter.py` or related modules (e.g., other handlers for group/private messages), they should be replaced with calls to a shared helper as well. At that point, consider moving `_strip_session_suffix` to a module-level function or a static method on `LarkMessageEvent`/the adapter class to reuse it across methods instead of redefining it locally.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review it |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
%thread%/%root%markers are hardcoded in several places (session construction and_strip_session_suffix); consider centralizing them as shared constants to avoid typos and keep encoding/decoding logic in sync. - In
_strip_session_suffix, you currently split on the first occurrence of%thread%/%root%anywhere in the string; if you ever pass through non-Lark or differently formatted session IDs this could behave unexpectedly, so you might want to only strip when the marker matches your exact encoding pattern (e.g., suffix check or a more explicit schema). - The info-level log
[Lark] 非话题消息,将通过 reply_in_thread=True 创建新话题insend_streamingwill fire on every auto-threaded streaming reply and may be noisy in large groups; consider downgrading this to debug or sampling it.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `%thread%` / `%root%` markers are hardcoded in several places (session construction and `_strip_session_suffix`); consider centralizing them as shared constants to avoid typos and keep encoding/decoding logic in sync.
- In `_strip_session_suffix`, you currently split on the first occurrence of `%thread%` / `%root%` anywhere in the string; if you ever pass through non-Lark or differently formatted session IDs this could behave unexpectedly, so you might want to only strip when the marker matches your exact encoding pattern (e.g., suffix check or a more explicit schema).
- The info-level log `[Lark] 非话题消息,将通过 reply_in_thread=True 创建新话题` in `send_streaming` will fire on every auto-threaded streaming reply and may be noisy in large groups; consider downgrading this to debug or sampling it.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/anthropic_source.py" line_range="422" />
<code_context>
# 解析完整的工具调用
tool_info = tool_use_buffer[event.index]
try:
- if "input_json" in tool_info:
+ if "input_json" in tool_info and tool_info["input_json"]:
tool_info["input"] = json.loads(tool_info["input_json"])
</code_context>
<issue_to_address>
**suggestion:** Guard against non-string `input_json` before calling `json.loads`.
To avoid unexpected TypeErrors if `input_json` is ever set to a non-string (e.g., a dict) by the SDK or upstream code, consider also checking its type before parsing, e.g. `if isinstance(tool_info.get("input_json"), str) and tool_info["input_json"]:`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Add `lark_auto_thread` config option (default: off) to control whether bot replies auto-create threads - Thread-aware session_id using thread_id/root_id for context isolation - Pass `reply_in_thread` through full send chain (text, file, audio, video, streaming card, interactive card, collapsible panel) - Extract `_strip_session_suffix` helper for session_id parsing - Add i18n translations for zh-CN, en-US, ru-RU - Fix empty/non-string input_json causing errors in anthropic streaming tool call parsing Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
e39d9a8 to
ab9fcbe
Compare
|
@sourcery-ai review it |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
lark_auto_threadbehavior currently applies to all messages (including single chats) via_should_reply_in_thread = auto_thread and not bool(message.thread_id); consider gating this byMessageType.GROUP_MESSAGEsoreply_in_threadis only used where Lark/Feishu threads are actually supported. - The session_id construction for group vs non-group messages in
convert_msgduplicates the%thread%/%root%suffix logic; extracting this into a small helper (similar to_strip_session_suffix) would reduce duplication and keep future changes to thread scoping consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `lark_auto_thread` behavior currently applies to all messages (including single chats) via `_should_reply_in_thread = auto_thread and not bool(message.thread_id)`; consider gating this by `MessageType.GROUP_MESSAGE` so `reply_in_thread` is only used where Lark/Feishu threads are actually supported.
- The session_id construction for group vs non-group messages in `convert_msg` duplicates the `%thread%` / `%root%` suffix logic; extracting this into a small helper (similar to `_strip_session_suffix`) would reduce duplication and keep future changes to thread scoping consistent.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary
Add thread (话题) support for the Lark/Feishu platform adapter:
Auto-create thread on reply: When replying to a non-thread group message, automatically set
reply_in_thread=Trueto create a new thread. Messages already inside a thread reply naturally within it.Thread-aware session isolation: Use
thread_id/root_idto build isolatedsession_ids (format:{chat_id}%thread%{thread_id}or{chat_id}%root%{root_id}), so conversation context is properly scoped per thread instead of per group.Streaming card in thread: Streaming cards (CardKit) also respect
reply_in_thread, enabling the typewriter effect within auto-created threads.Files Changed
astrbot/core/platform/sources/lark/lark_event.py— Passreply_in_threadthrough the full send chain (text, file, audio, video, streaming card) instead of hardcodingFalse.astrbot/core/platform/sources/lark/lark_adapter.py— Build thread-awaresession_id; parse it correctly insend_by_session; passis_in_threadflag toLarkMessageEvent.Modifications / 改动点
Screenshots or Test Results / 运行截图或测试结果
Tested on a real Feishu group:
Checklist / 检查清单
Summary by Sourcery
Add thread-aware messaging and context isolation to the Lark/Feishu adapter and refine tool call input parsing for the Anthropic provider.
New Features:
lark_auto_threadconfiguration flag.Bug Fixes:
input_jsonwhen it is a non-empty string, preventing invalid JSON errors.Enhancements:
Documentation:
lark_auto_threadsetting in localized dashboard configuration metadata for English, Russian, and Chinese locales.