feat(core): ensureHfIds node-id pass + shared hf- mint helper (R1) by vanceingalls · Pull Request #1269 · heygen-com/hyperframes

vanceingalls · 2026-06-08T04:27:36Z

What

Adds ensureHfIds(html: string): string in packages/core/src/parsers/hfIds.ts. Single DOM pass (via linkedom) that mints a data-hf-id attribute on every eligible element before the caller sees the markup.

Id derivation: FNV-1a 32-bit hash of tagName | sorted-attrs(\x00/\x01 separated) | ownText, last 4 chars of base-36, hf- prefix. Collision resolution appends a sibling counter and re-hashes. Preserves existing ids (elements with data-hf-id already set are skipped). Excludes non-visual tags: script, style, template, meta, link, noscript, base.

Fragment handling: detects bare HTML fragments (no <!doctype / <html) and wraps in a full document shell before parsing, then returns body.innerHTML — matching the pattern used by parseSourceDocument in sourceMutation.

Why

Counter-based ids (element-0, element-1, …) are positional. Inserting a new layer at position 0 shifts every id below it. The R1 milestone requires content-based, stable ids so that targeting operations (split, patch, probe) stay valid across re-parses and element insertions. T2 spec (stableIds.test.ts) defines the contract: same content → same id, adding a sibling doesn't change other ids, format matches /^hf-[a-z0-9]{4}$/.

How

toHfId(hash) — slice(-4) of hash.toString(36) for better distribution across the suffix space
data-hf-id is excluded from the hash input (prevents circular dependency on the attribute being set)
Already-assigned ids tracked in a Set; duplicates get a counter suffix before re-hashing
Fragment detection: /<!doctype|<html[\s>]/i.test(html) — if bare, wrap→parse→body.innerHTML

Test plan

T2 spec: packages/core/src/parsers/stableIds.test.ts — 7 tests pass (3 were .fails stubs targeting R1; 4 were pre-existing baselines that must not regress)
No changes to existing htmlParser tests needed at this layer (wiring is PR feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1) #1270)

vanceingalls · 2026-06-08T04:27:48Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

miguel-heygen

Solid foundation for R1. The design choices all hold up — FNV-1a is the right hash for this (pure, deterministic, fast), content-key exclusion of data-hf-* is correct, and the fragment-detection/wrap/unwrap mirrors what parseSourceDocument does. Tests are excellent, especially the "KNOWN LIMITATION" assertions that pin the lifecycle gap honestly.

One thing worth flagging:

P3 — collision limit throws in production:

if (dup > 1000) throw new Error("ensureHfIds: hash collision limit exceeded");

1000+ identical elements (same tag, attrs, text) is unlikely in real compositions, but an uncaught throw from ensureHfIds during rendering would crash the entire parse call. Prefer a graceful fallback — e.g. ${key}#pos${positionIndex} or a counter suffix derived from the element's position in the body.querySelectorAll("*") walk — over a hard throw. The Set-based collision resolution already handles the common case; this guard is only for the adversarial edge.

Everything else looks correct. ✅

james-russo-rames-d-jusso

Walked hfIds.ts end-to-end. The mint algorithm is well-thought: FNV-1a 32-bit over tagName | sorted(non-data-hf-* attrs) | ownText, last-4 base-36 chars for the id, seeds the assigned set with pre-existing pins BEFORE walking so fresh mints never collide with persisted ids, and the test suite explicitly pins BOTH the R1 guarantees (idempotence, pinned-id survival, distinct-sibling uniqueness, determinism) AND the four known limitations of pure-content-hash mode (edit-drifts-id, attr-edit-drifts-id, identical-sibling-2nd-occurrence-has-no-content-stable-handle). That last block — KNOWN LIMITATION tests with comments pointing at the design doc's "Implementation status & verified lifecycle gap" — is exactly the right shape for a foundation PR. Future contributors won't accidentally rely on guarantees that don't hold yet.

Concerns

[The assigned set is seeded only from document.querySelectorAll("[data-hf-id]"), but minting walks body.querySelectorAll("*")] Lines 79-82:

const assigned = new Set<string>();
for (const el of Array.from(document.querySelectorAll("[data-hf-id]"))) {
  const existing = el.getAttribute("data-hf-id");
  if (existing) assigned.add(existing);
}

The seed walks the WHOLE document including <head> elements; minting walks only <body>. If a <head> element somehow carries a data-hf-id (it shouldn't, but <title> could be edited with one in a hand-rolled HTML), the head's id is pinned into assigned and a body element trying to mint that same value gets bumped to the dup branch.

Probably never fires in practice — but the asymmetry is worth either (a) seeding from body.querySelectorAll("[data-hf-id]") to match the mint scope, or (b) one-line comment noting the deliberate pin-across-document choice. Right now it reads like an inconsistency.

[Two parses per parseHtml call] ensureHfIds uses linkedom to mint, then the existing parseHtml re-parses with DOMParser. Each call to parseHtml (which is in the hot path for Studio operations, per the briefing) now does:

linkedom parseHTML + walk + serialize → output HTML
DOMParser parseFromString → AST for clip extraction

Likely fine at clip-model scale, but worth one of: (a) a perf note in the PR body, (b) a benchmark on the largest realistic clip-model size, or (c) a flag to skip ensureHfIds when the input is already known to be from generateElementHtml (which itself emits data-hf-id).

Also worth verifying: linkedom serialization is byte-stable across a parseHTML → toString() round-trip when the input has a <!doctype> shell. The idempotence test (expect(twice).toBe(once)) pins f(f(x)) === f(x) — but doesn't pin f(x) === x for inputs that already have all ids minted. If linkedom normalizes attribute quoting or whitespace, every parseHtml call subtly mutates the source. Probably fine since downstream consumers don't care, but worth confirming.

[4-char id space, no rationale comment] ~1.68M total ids (36^4). Birthday-paradox collision at N elements: P ≈ N²/(2·36^4). At 1000 elements in a single document: ~30% probability of at least one collision. The dup++ rehash handles it cleanly (and the 1000-iter throw is a safety net), but a one-line comment in toHfId on the chosen width vs. larger alternatives would help future readers — especially since the test fixture (fnv1a of <p class="x">same</p> minted once then minted again deliberately collides) shows the dup path actually runs.

Not asking to bump to 6 chars (4 is plenty for real clip-models) — just a // 4 chars · 36^4 ≈ 1.68M, sufficient for ≤1000 elements/doc with <1% collision per doc after dup rehash would document the choice.

Nits

[mintHfId is exported but only consumed internally] The doc comment says "shared by htmlParser for clip ids", but #1270's parseHtml reads el.getAttribute("data-hf-id") directly without calling mintHfId. If the future intent is to import mintHfId from parsers/hfIds.ts in htmlParser.ts, fine — flag it. If not, demoting it to non-exported tightens the public surface.

[Hash-input encoding] ${a.name}\x00${a.value} + \x01 between pairs is a good choice (control chars are invalid in HTML attribute names/values per the spec) — but el.tagName.toLowerCase() is concatenated with | separator on the outer level. If an attribute somehow contained a literal | (which IS valid in HTML), it'd be ambiguous with the separator. Edge case — but consistency with the \x00/\x01 choice would suggest using \x02 as the outer separator too. Cosmetic.

[EXCLUDED_TAGS lookup is per-element] Inside the hot loop, EXCLUDED_TAGS.has(el.tagName.toLowerCase()) recomputes the lowercase tag name for every element. Could cache el.tagName.toLowerCase() in a local once. Sub-microsecond savings; ignore unless profiling says otherwise.

Questions

[Document-vs-body scope for the pin seed] Same as the first concern but framed as intent: was pinning across the whole document deliberate (defense for future use of <head> ids?), or coincidental from copying querySelectorAll without scoping to body?

[Is there a plan for the wired-in callers (R2+) to also call ensureHfIds?] Right now ensureHfIds runs inside parseHtml only. If R2's mutation flow round-trips HTML without going through parseHtml, the data-hf-id values could go stale. Knowing whether ensureHfIds will be a guard at multiple boundaries or just one would help me reason about the contract.

CI status

CLI smoke (required) shows FAILURE — read the log: Navigation timeout of 60000 ms exceeded during capture_calibration, with browserGpuMode auto → software (WebGL unavailable) preamble. This is the headless-shell calibration timeout you've been actively fixing in #1233 / #1237 / #1239. The diff in this PR is hfIds.ts + hfIds.test.ts — pure parser-level code, zero render-path impact. CI infra flake, not a real failure. Re-run should pass.

Other 36 checks SUCCESS, 8 pending (regression-shards, Windows, etc).

What I didn't verify

That parseHTML(html).toString() for the has-document-shell case is genuinely byte-stable across input → output for inputs that don't change semantically. The PR's idempotence test pins f(f(x)) === f(x) but not f(x) === x for fully-pinned inputs.
That linkedom and DOMParser agree on attribute parsing edge cases (e.g. unquoted attributes, attributes with embedded >, etc). If they disagree, the cross-parser hop could produce different element sets — but the test fixtures look pretty clean.
The expected throughput of parseHtml on the largest realistic clip-model size; the perf comment is theoretical until measured.

Verdict

Keystone PR for the R1 stack. Mint algorithm is sound, test discipline is exemplary (pinning both guarantees and known limitations is the right move for foundation work), and the API surface is small. The two concerns above are worth addressing but neither is structurally blocking — most worth fixing is the assigned-set scope asymmetry, which is a 1-line change. Clean execution from where I sit; leaving as a comment.

— Review by Rames D Jusso

…iew) Addresses review on #1269 (Miguel P3 + Rames): - mintHfId no longer throws on the pathological collision cap; it widens the id with the dup counter instead, so a degenerate document can never crash the whole parse call. - Seed the `assigned` set from `body.querySelectorAll("[data-hf-id]")` (was the whole document) to match the mint walk's scope — a stray head id can't bump a body element off its hash. - Document the 4-char id-space rationale in toHfId. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vanceingalls · 2026-06-08T22:26:51Z

Addressed review (daecb2d): P3 — mintHfId no longer throws on the collision cap; it widens the id with the dup counter so a degenerate doc can't crash the parse. Rames — seed the assigned set from body.querySelectorAll (was whole document) to match the mint scope, and added the 4-char id-space rationale comment to toHfId. Lint/test green.

- Filter all data-hf-* attrs from contentKey (not just data-hf-id and data-hf-studio-*) - Use \x00/\x01 byte separators to eliminate ambiguous & and = in attr serialization - Use suffix slice for toHfId (most-avalanched FNV bits) - Add 1000-iteration cap to mintHfId collision loop - Fix misleading "Browser-safe" comment (linkedom is Node-only) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…apping - EXCLUDED_TAGS: add link, noscript, base (non-editable body elements that should never receive a stable id and become mutation targets) - ensureHfIds: wrap bare HTML fragments before parsing, matching the logic in parseSourceDocument — without this, linkedom places bare-fragment content outside <body> and body.querySelectorAll returns nothing, so no ids are minted and document.toString() returns a corrupted structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Locks the pin guarantee and documents the two current limitations (no write-back yet, design §3): pinned id survives a content edit; an unpinned id drifts on text/attribute edits (flip to .toBe once write-back lands); a second identical-content sibling has only a position-derived dedup id. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…iew) Addresses review on #1269 (Miguel P3 + Rames): - mintHfId no longer throws on the pathological collision cap; it widens the id with the dup counter instead, so a degenerate document can never crash the whole parse call. - Seed the `assigned` set from `body.querySelectorAll("[data-hf-id]")` (was the whole document) to match the mint walk's scope — a stray head id can't bump a body element off its hash. - Document the 4-char id-space rationale in toHfId. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Target (R7, T5a) (#1296) ## Summary - Adds `hfId` field to `resolveDomEditSelection` — reads `data-hf-id` off the live element and stores it in `DomEditSelection.hfId` - `DomEditSelection extends PatchTarget` which already declares `hfId?: string`, so this is a single new line at the return site - Widens `MutationTarget` in `files.ts` to include `hfId?: string` (type hygiene — the value already survives through `parseMutationBody`'s by-reference pass, so this is documentation not a behaviour change) ## Why R7 / Task 5a. The full hf-id write-back and patch-engine infrastructure (R1 + R7 Tasks 0–4, PRs #1269–#1292) is server-complete. The only missing piece was: the Studio client never read `data-hf-id` off a hit-tested element, so `target.hfId` was always `undefined` and the `hfId`-first lookup branches in both patch engines were unreachable in production. This PR fixes the selection side — the commit wire (#1297) completes the path. ## Test plan - [ ] `packages/studio/src/components/editor/domEditingLayers.test.ts` — two new tests with jsdom environment: - `resolveDomEditSelection` on an element with `data-hf-id` → `selection.hfId` is populated - element without `data-hf-id` → `selection.hfId` is `undefined` - [ ] All 65 studio test files pass, all 72 core test files pass 🤖 Generated with [Claude Code](https://claude.com/claude-code)

This was referenced Jun 8, 2026

feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1) #1270

Merged

feat(studio): sourcePatcher data-hf-id targeting (R1, T3) #1271

Merged

feat(core): sourceMutation data-hf-id targeting (R1, T7) #1272

Merged

vanceingalls force-pushed the 06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_ branch from b2bfae7 to e275151 Compare June 8, 2026 17:24

vanceingalls marked this pull request as draft June 8, 2026 18:09

vanceingalls marked this pull request as ready for review June 8, 2026 21:44

miguel-heygen approved these changes Jun 8, 2026

View reviewed changes

james-russo-rames-d-jusso reviewed Jun 8, 2026

View reviewed changes

vanceingalls and others added 5 commits June 8, 2026 21:02

feat(core): ensureHfIds node-id pass + shared hf- mint helper (R1)

981bc40

vanceingalls force-pushed the 06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_ branch from daecb2d to 4476246 Compare June 9, 2026 04:03

vanceingalls merged commit c12854b into main Jun 9, 2026
46 checks passed

vanceingalls deleted the 06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_ branch June 9, 2026 06:28

vanceingalls mentioned this pull request Jun 9, 2026

feat(studio,core): populate hfId in DomEditSelection + widen MutationTarget (R7, T5a) #1296

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): ensureHfIds node-id pass + shared hf- mint helper (R1)#1269

feat(core): ensureHfIds node-id pass + shared hf- mint helper (R1)#1269
vanceingalls merged 5 commits into
mainfrom
06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_

vanceingalls commented Jun 8, 2026 •

edited

Loading

Uh oh!

vanceingalls commented Jun 8, 2026 •

edited

Loading

Uh oh!

miguel-heygen left a comment

Uh oh!

james-russo-rames-d-jusso left a comment

Uh oh!

vanceingalls commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vanceingalls commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Test plan

Uh oh!

vanceingalls commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

james-russo-rames-d-jusso left a comment

Choose a reason for hiding this comment

Concerns

Nits

Questions

CI status

What I didn't verify

Verdict

Uh oh!

vanceingalls commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vanceingalls commented Jun 8, 2026 •

edited

Loading

vanceingalls commented Jun 8, 2026 •

edited

Loading