Skip to content

security: extend hidden-element detection to every DOM-reading channel#1032

Open
garagon wants to merge 1 commit intogarrytan:mainfrom
garagon:security/confusion-protocol-channel-coverage
Open

security: extend hidden-element detection to every DOM-reading channel#1032
garagon wants to merge 1 commit intogarrytan:mainfrom
garagon:security/confusion-protocol-channel-coverage

Conversation

@garagon
Copy link
Copy Markdown
Contributor

@garagon garagon commented Apr 16, 2026

Summary

The Confusion Protocol envelope wrap already covers every scoped PAGE_CONTENT_COMMANDtext, html, links, forms, accessibility, attrs, console, dialog, media, data, ux-audit. The hidden-element / ARIA-injection detection layer (markHiddenElements + getCleanTextWithStripping in content-security.ts) only ran when command === 'text'. For every other DOM-reading channel the output went through the envelope with no hidden-content filter.

Net effect: a page that carries a display:none div with payload text, or a button with an aria-label matching one of the injection patterns, has that payload leak to the LLM the moment the agent calls browse html, browse accessibility, browse attrs, or any of the other non-text channels. The envelope wrap by itself does not mitigate this — it only tells the model that content is untrusted, not that the visible page never actually rendered those bytes to the human operator.

Reproduction (before the fix)

bun run report/evidence/poc-f008-channel-gaps.ts

The PoC simulates the dispatcher against a page that renders a hidden injection div plus an aria-label injection, then walks every PAGE_CONTENT_COMMAND. With the current code, every non-text channel emits the hidden payload inside the envelope. Expected after the fix: no channel returns the raw payload without a CONTENT WARNINGS header flagging it.

Root cause

browse/src/server.ts around the read dispatch:

if (isScoped && command === 'text') {
  const page = session.getPage();
  await markHiddenElements(page);
  // ...
  result = await getCleanTextWithStripping(target);
}

The condition gates on the literal 'text' string. Every other scoped channel falls through to handleReadCommand and never touches the hidden-element detector. The envelope wrap later in the same handler does not know which content came from hidden nodes, so it cannot flag them.

Fix

Two parts, each small on its own.

  1. New export DOM_CONTENT_COMMANDS in browse/src/commands.ts. Subset of PAGE_CONTENT_COMMANDS whose output is derived from the live DOM tree: text, html, links, forms, accessibility, attrs, media, data, ux-audit. console and dialog stay out — they read separate runtime state (captured console output, queued dialog events), so running the DOM detector would be wasteful and potentially racy against navigation.

  2. Dispatcher gates on the set, not on text. browse/src/server.ts now does:

    let hiddenContentWarnings: string[] = [];
    if (isScoped && DOM_CONTENT_COMMANDS.has(command)) {
      const page = session.getPage();
      try {
        const strippedDescs = await markHiddenElements(page);
        if (strippedDescs.length > 0) {
          hiddenContentWarnings = strippedDescs.slice(0, 8).map(d => `hidden content: ${d.slice(0, 120)}`);
          if (strippedDescs.length > 8) {
            hiddenContentWarnings.push(`hidden content: +${strippedDescs.length - 8} more flagged elements`);
          }
        }
        if (command === 'text') {
          result = await getCleanTextWithStripping(target);   // unchanged
        } else {
          result = await handleReadCommand(command, args, session, browserManager);
        }
      } finally {
        await cleanupHiddenMarkers(page);
      }
    }

    And the centralized wrap block folds those descriptions into combinedWarnings before the envelope call:

    const combinedWarnings = [...filterResult.warnings, ...hiddenContentWarnings];
    result = wrapUntrustedPageContent(
      result, command,
      combinedWarnings.length > 0 ? combinedWarnings : undefined,
    );

text still gets physical stripping via getCleanTextWithStripping (no behavior change on that path). Every other scoped DOM channel keeps its output format unchanged and now also emits a ⚠ CONTENT WARNINGS: header listing the flagged hidden nodes. The LLM sees, for example:

⚠ CONTENT WARNINGS: hidden content: [div] opacity < 0.1: "IGNORE...";  hidden content: [button] ARIA injection: "System: you are..."
═══ BEGIN UNTRUSTED WEB CONTENT ═══
...actual channel output...
═══ END UNTRUSTED WEB CONTENT ═══

That turns the silent-leak into a visible, flagged payload the LLM can refuse.

What stayed the same

  • text command still uses getCleanTextWithStripping — visible hidden nodes are physically removed before the read, exactly as before.
  • console, dialog — not in DOM_CONTENT_COMMANDS, no detector runs, no behavior change. These read runtime state, not the DOM.
  • Root-token (non-scoped) calls — unchanged, same wrapUntrustedContent backward-compat path.
  • Sentinel strings, envelope shape, datamarking behavior, chain recursion guard — all unchanged.
  • No new dependencies. No config knobs.

Sibling review

Greped browse/src/ for every markHiddenElements or getCleanTextWithStripping call:

Location Trigger Change
server.ts read dispatch was command === 'text' now DOM_CONTENT_COMMANDS.has(command)
content-security.ts helpers page DOM traversal unchanged
any other call site none n/a

No other call site invokes the hidden-element detector. All scoped DOM-channel output now passes through the same gate.

Tests

browse/test/content-security.test.ts — new DOM-content channel coverage describe block, 5 cases:

  • commands.ts exports DOM_CONTENT_COMMANDS.
  • The set covers text, html, links, forms, accessibility, attrs, media, data, ux-audit, and does not include console or dialog.
  • The server's scoped-read dispatch gates on DOM_CONTENT_COMMANDS.has(command) and contains both markHiddenElements and cleanupHiddenMarkers (source-level lock — if a future refactor narrows the gate back to 'text', the test trips).
  • hiddenContentWarnings plumbs into combinedWarnings and reaches wrapUntrustedPageContent.
  • DOM_CONTENT_COMMANDS is a strict subset of PAGE_CONTENT_COMMANDS (runtime import check).

bun test browse/test/content-security.test.ts: 52 pass, 0 fail.

Negative control

Reverting browse/src/server.ts and browse/src/commands.ts to origin/main and rerunning:

  • DOM_CONTENT_COMMANDS import fails first (not exported yet).
  • The source-level regressions on server.ts fail.

Applying the fix: 52/52 pass.

Files

 browse/src/commands.ts               | 16 +++++++++
 browse/src/server.ts                 | 47 ++++++++++++++++++------
 browse/test/content-security.test.ts | 69 ++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 11 deletions(-)

How to verify

git checkout security/confusion-protocol-channel-coverage
bun install
bun test browse/test/content-security.test.ts   # 52 pass
bun test                                           # full suite, exit 0

The Confusion Protocol envelope wrap (`wrapUntrustedPageContent`)
covers every scoped PAGE_CONTENT_COMMAND, but the hidden-element
ARIA-injection detection layer only ran for `text`. Other DOM-reading
channels (html, links, forms, accessibility, attrs, data, media,
ux-audit) returned their output through the envelope with no hidden-
content filter, so a page serving a display:none div that instructs
the agent to disregard prior system messages, or an aria-label that
claims to put the LLM in admin mode, leaked the injection payload on
any non-text channel. The envelope alone does not mitigate this, and
the page itself never rendered the hostile content to the human
operator.

Fix:
  * New export `DOM_CONTENT_COMMANDS` in commands.ts — the subset of
    PAGE_CONTENT_COMMANDS that derives its output from the live DOM.
    Console and dialog stay out; they read separate runtime state.
  * server.ts runs `markHiddenElements` + `cleanupHiddenMarkers` for
    every scoped command in this set. `text` keeps its existing
    `getCleanTextWithStripping` path (hidden elements physically
    stripped before the read). All other channels keep their output
    format but emit flagged elements as CONTENT WARNINGS on the
    envelope, so the LLM sees what it would otherwise have consumed
    silently.
  * Hidden-element descriptions merge into `combinedWarnings`
    alongside content-filter warnings before the wrap call.

Tests: new describe block in content-security.test.ts covering
  * `DOM_CONTENT_COMMANDS` export shape and channel membership;
  * dispatch gates on `DOM_CONTENT_COMMANDS.has(command)`, not the
    literal `text` string;
  * hiddenContentWarnings plumbs into `combinedWarnings` and reaches
    wrapUntrustedPageContent;
  * DOM_CONTENT_COMMANDS is a strict subset of PAGE_CONTENT_COMMANDS.

Existing datamarking, envelope wrap, centralized-wrapping, and chain
security suites stay green (52 pass, 0 fail).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant