fix(api): retry agent lookup on WebSocket connect to prevent 404 race by nicseltzer · Pull Request #910 · RightNow-AI/openfang

nicseltzer · 2026-03-29T16:33:58Z

Summary

When a client opens a WebSocket connection to an agent immediately after spawn, the WS upgrade request can arrive before the kernel finishes inserting the agent into the registry, causing a spurious 404.
Replaced the single-shot registry.get() check with a retry loop that polls every 100ms for up to 3 seconds before returning 404.
No new abstractions; change is a targeted 14-line diff in crates/openfang-api/src/ws.rs.

Test plan

cargo build --workspace --lib — passes
cargo clippy --workspace --all-targets -- -D warnings — zero warnings
Manual: start daemon, open WS immediately after agent create, verify connection succeeds instead of 404

Fixes #804

🤖 Generated with Claude Code

(cherry picked from commit 7432086)

Parse the mentions JSON array from Mattermost event data and set was_mentioned metadata when the bot user ID is present, enabling agents to distinguish direct mentions from background group traffic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The agent config reload logic was missing skills and mcp_servers from the change detection, so edits to these fields in agent.toml weren't being picked up when loading agents from SQLite. Added both fields to the comparison to ensure proper hot-reload.

The skills and mcp_servers fields must be at the top level of the agent.toml, not after [capabilities], due to TOML implicit table ordering rules.

`MessageContent::text_length()` returned 0 for `ToolUse` blocks, ignoring the tool name and JSON input arguments. This caused the compactor's `estimate_token_count()` (which uses `text_length()`) to massively undercount tokens when conversations contained tool calls with large arguments (e.g. web_search results, page content). The result: compaction never triggered despite the session exceeding the context window, leading to "Token limit exceeded" errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lientTransportConfig The struct is marked #[non_exhaustive] in rmcp, so struct expression syntax is rejected by the compiler. Switch to Default + field assignment. Confidence: high Scope-risk: narrow

Extract is_silent_token() helper for case-insensitive [SILENT] detection. Revert unrelated Cargo.lock and formatting changes. Add unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a Hand agent is respawned (on daemon restart or reactivation), activate_hand() rebuilds the manifest entirely from HAND.toml, silently discarding any tool_allowlist/tool_blocklist changes made via the API. The API returned {"status":"ok"} for these updates, creating a broken contract: changes appeared to succeed but were lost on the next restart. Fix: capture the existing agent's tool filters before killing it at respawn time, and reapply them to the freshly-built manifest. Empty filters are treated as "no override" (not "block all") so hands with no API-set filters continue to get the full tool list from HAND.toml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extends the API-patch preservation introduced in 81eb7cb to cover model config fields. Previously, provider, model, and temperature changes made via the agents API were lost every time a Hand was respawned (daemon restart, crash recovery, or manual reactivation) because activate_hand() rebuilt the AgentManifest entirely from the compile-time-embedded HAND.toml. Changes: - kernel.rs: single registry scan now captures both tool filters and model config before rebuild; existing_model_override carries provider/model/ temperature and is reapplied after the manifest is built, only when the live values differ from what HAND.toml would produce. system_prompt is intentionally excluded — it is assembled dynamically from HAND.toml plus settings context and must stay live. - registry.rs: add update_temperature() for hot-patching sampling temp. - routes.rs: expose temperature in list/get responses; add temperature field to PatchAgentConfigRequest and implement it in patch_agent_config with 0.0–2.0 validation. - index_body.html + agents.js: temperature input in the agent config tab. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Dashboard passwords were hashed with plain SHA256 (no salt), vulnerable to rainbow tables and GPU brute force. Switch to Argon2id with random per-hash salts. Breaking change: existing SHA256 hashes in config.toml must be regenerated with `openfang auth hash-password`.

Addresses review feedback: - Add `openfang auth hash-password` subcommand so users can generate Argon2id hashes after upgrading (the command referenced in docs). - Emit a tracing::warn at daemon startup when auth is enabled but the password_hash is not in Argon2id format, so users know why login fails.

Regenerated lockfile after cherry-picking 9 upstream PRs. Fixed needless borrow in line.rs introduced by PR #877 formatting. Confidence: high Scope-risk: narrow

43 AGENTS.md files providing AI-readable documentation for every significant directory. Each includes purpose, key files, subdirectory links, agent working instructions, testing requirements, and dependency listings. Linked via  hierarchy. Confidence: high Scope-risk: narrow

When an agent is being spawned, the WS upgrade can arrive before registry insertion completes. Retry the lookup for up to 3 seconds before returning 404. Fixes: #804 Confidence: high Scope-risk: narrow

jaberjaber23 · 2026-03-31T00:02:16Z

Reviewed and approved. The real code change is solid. However this PR has merge conflicts and ~3,000 lines of auto-generated AGENTS.md files inflating the diff. Please: (1) rebase onto current main, (2) remove the AGENTS.md files from this PR (submit those separately if desired). We will merge immediately after.

nicseltzer · 2026-03-31T00:30:47Z

Ope -- I didn't even mean for these to push upstream. My bad!

nicseltzer · 2026-03-31T00:30:47Z

Ope -- I didn't even mean for these to push upstream. My bad!

tytsxai and others added 18 commits March 29, 2026 08:58

channels: add startup timeout for telegram control-plane calls

1cf55c4

(cherry picked from commit 7432086)

fix: restore file permissions from PR #793 cherry-pick

29e6d70

Fix DingTalk stream group replies and ack flow

372fc92

fix: stamp release version from tag

bc1bc18

test: add test for agent skills/mcp_servers TOML parsing

7beb31c

The skills and mcp_servers fields must be at the top level of the agent.toml, not after [capabilities], due to TOML implicit table ordering rules.

fix(runtime): use field assignment for non-exhaustive StreamableHttpC…

ff862a0

…lientTransportConfig The struct is marked #[non_exhaustive] in rmcp, so struct expression syntax is rejected by the compiler. Switch to Default + field assignment. Confidence: high Scope-risk: narrow

Fix silent reinforcement: recognize [SILENT] token in agent replies

59aa05f

Extract is_silent_token() helper for case-insensitive [SILENT] detection. Revert unrelated Cargo.lock and formatting changes. Add unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: cargo fmt

4fb38d5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: regenerate Cargo.lock and fix clippy warning from cherry-picks

ccbbb5c

Regenerated lockfile after cherry-picking 9 upstream PRs. Fixed needless borrow in line.rs introduced by PR #877 formatting. Confidence: high Scope-risk: narrow

fix(api): retry agent lookup on WebSocket connect to prevent 404 race

d519897

When an agent is being spawned, the WS upgrade can arrive before registry insertion completes. Retry the lookup for up to 3 seconds before returning 404. Fixes: #804 Confidence: high Scope-risk: narrow

nicseltzer closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): retry agent lookup on WebSocket connect to prevent 404 race#910

fix(api): retry agent lookup on WebSocket connect to prevent 404 race#910
nicseltzer wants to merge 18 commits intoRightNow-AI:mainfrom
nicseltzer:fix/804-ws-agent-race

nicseltzer commented Mar 29, 2026

Uh oh!

jaberjaber23 commented Mar 31, 2026

Uh oh!

nicseltzer commented Mar 31, 2026

Uh oh!

nicseltzer commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

nicseltzer commented Mar 29, 2026

Summary

Test plan

Uh oh!

jaberjaber23 commented Mar 31, 2026

Uh oh!

nicseltzer commented Mar 31, 2026

Uh oh!

nicseltzer commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants