Skip to content

feat(core): hf-id write-back to disk + serve-time surfacing (R7, Tasks 1-2)#1289

Merged
vanceingalls merged 14 commits into
mainfrom
06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_
Jun 9, 2026
Merged

feat(core): hf-id write-back to disk + serve-time surfacing (R7, Tasks 1-2)#1289
vanceingalls merged 14 commits into
mainfrom
06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds persistHfIdsIfNeeded to hfIdPersist.ts — runs ensureHfIds and writes the result back to the source file on first serve (idempotent; skips write if already tagged)
  • Preview route calls persistHfIdsIfNeeded before bundling so data-hf-id is persisted to disk and present in the served HTML in one request
  • Replaces two tautological stability tests (passed on parent commit without the feature) with real disk-write tests for persistHfIdsIfNeeded
  • Adds Hono in-process integration tests verifying served HTML carries data-hf-id and the source file on disk is updated after the first GET

Test plan

  • bunx vitest run src/studio-api/helpers/hfIdPersist.test.ts — 5 pass
  • bunx vitest run src/studio-api/routes/preview.test.ts — 15 pass (includes 2 new hf-id surfacing tests)

@vanceingalls vanceingalls changed the title feat(core): hf-id write-back to disk + serve-time surfacing (R7, Task 1-2) feat(core,studio): R7 Tasks 1-4 — hf-id write-back, serve-time surfacing, PreviewAdapter, draft-marker constants Jun 9, 2026
@vanceingalls vanceingalls force-pushed the 06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_ branch from d3fd740 to 0886829 Compare June 9, 2026 02:29
@vanceingalls vanceingalls changed the title feat(core,studio): R7 Tasks 1-4 — hf-id write-back, serve-time surfacing, PreviewAdapter, draft-marker constants feat(core): hf-id write-back to disk + serve-time surfacing (R7, Tasks 1-2) Jun 9, 2026
@vanceingalls vanceingalls force-pushed the 06-08-test_core_previewadapter_contract_failing_tests_t10_spec_for_r7_ branch from 2cb29b1 to fa6996c Compare June 9, 2026 03:45
@vanceingalls vanceingalls force-pushed the 06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_ branch from 0886829 to 03dcbaa Compare June 9, 2026 03:45
@vanceingalls vanceingalls force-pushed the 06-08-test_core_previewadapter_contract_failing_tests_t10_spec_for_r7_ branch from fa6996c to c125ecc Compare June 9, 2026 04:01
@vanceingalls vanceingalls force-pushed the 06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_ branch from 03dcbaa to 6160c9e Compare June 9, 2026 04:01
@vanceingalls vanceingalls force-pushed the 06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_ branch from 6160c9e to a067aa9 Compare June 9, 2026 04:03
@vanceingalls vanceingalls force-pushed the 06-08-test_core_previewadapter_contract_failing_tests_t10_spec_for_r7_ branch from c125ecc to 7d1bf83 Compare June 9, 2026 04:03

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid Tasks 1-2 implementation. TOCTOU guard is clever and the attribute-count-based change detection (added in #1292) is the right call over string equality since linkedom normalises quote style. A couple of things worth eyeballing:

P2 — export { ensureHfIds } re-export in hfIdPersist.ts is an unnecessary indirection
preview.ts imports ensureHfIds from hfIdPersist.js but ensureHfIds lives in parsers/hfIds.js. Re-exporting it through hfIdPersist creates an invisible dependency chain: if hfIdPersist is tree-shaken or refactored, callers silently lose the function. Import it directly in preview.ts.

P2 — TOCTOU guard's correctness depends on an undocumented invariant

const current = readFileSync(filePath, "utf-8");
if (current === html) { writeFileSync(...) }

This works correctly only if html is the raw on-disk content at the time of the last read (not transformed/constructed HTML). If a caller ever passes constructed HTML, current === html will always be false and writes will silently never happen. Not a bug today (all call sites pass diskMain.html), but the invariant should be in a JSDoc comment on the function signature — otherwise the next caller will be confused by spurious write-skips.

P3 — double ensureHfIds on the non-bundled path
When adapter.bundle() returns nothing, bundled = normalizedDisk (already tagged), then ensureHfIds(await transformPreviewHtml(bundled, ...)) runs again. Idempotent, but one of these can be dropped. Addressed implicitly in #1292's catch-path refactor but the try path still carries both.

Overall: ✅ on the feature.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid write-back path. The split between normalizeHfIds (pure, returns {html, changed}) and persistHfIdsIfNeeded (side-effecting filesystem write) is the right shape — one pure function, one effecting function on top, both small and testable.

The wire-up in preview.ts is clean: persistHfIdsIfNeeded runs before adapter.bundle(), so by the time the bundler reads disk, the source file already carries data-hf-id. The bundle path then re-runs ensureHfIds after transformPreviewHtml as a safety net (idempotent — already-tagged elements are skipped, per hfIds.ts:51 if (el.getAttribute("data-hf-id")) continue). Good defense in depth.

Walked the two new hfIds.test.ts tests (L80-95):

  • pinned id survives text edit — write-back commits data-hf-id="hf-XXXX" to disk; subsequent text edit doesn't touch the attribute; re-parse keeps the same id because ensureHfIds skips pinned elements. ✅
  • pinned id survives attribute edit — same shape, modifies class. ✅

The collision-tiebreak doc comment added to hfIds.ts:57-69 is the right kind of WHY — explains the positional-but-safe-post-persist behavior so a future reader doesn't try to "fix" the order-dependence and break the contract. (Side note: I see #1292 replaces the attribute-edit test with a cross-document-stability test — that's a smart substitution, the attr-edit and text-edit tests pinned the same guarantee.)

Blockers

[changed flag uses string equality on ensureHfIds output — spurious writes on every request] At hfIdPersist.ts:9:

export function normalizeHfIds(html: string): { html: string; changed: boolean } {
  const normalized = ensureHfIds(html);
  return { html: normalized, changed: normalized !== html };
}

ensureHfIds parses with linkedom and re-serializes. The serialization can normalize attribute quoting (single → double), whitespace inside tags, attribute ordering, self-closing slash, and entity escaping — even when no ids were minted. So normalized !== html returns true on the very first served HTML that uses non-default formatting, even when the file is already fully tagged.

Concrete trigger: an HTML file the user hand-authored with <div data-hf-id='hf-ab12'> (single quotes). linkedom serializes as <div data-hf-id="hf-ab12">. changed=true. writeFileSync overwrites the user's file with normalized formatting every request. That's a silent disk write on every preview load, plus it stomps the user's stylistic choices.

This bug IS fixed in #1292 — switches to attribute-count comparison:

const idsBefore = (html.match(/\bdata-hf-id=/g) ?? []).length;
const idsAfter = (normalized.match(/\bdata-hf-id=/g) ?? []).length;
if (idsAfter > idsBefore) { ... }

So if the stack lands atomically (Graphite MQ picking up all 4 together), this resolves itself. But if #1289 lands ahead of #1292, prod sees the spurious-write bug. Two paths to close:

  1. Cherry-pick #1292's count-based detection into #1289. Smallest diff; this PR ships correct.
  2. Add a "DO NOT LAND BEFORE #1292" banner to the PR description so the MQ doesn't pick it up alone.

(1) is cleaner. The fix in #1292 is 4 lines.

[No concurrent-modification guard on the write] Same area. The current writeFileSync(filePath, normalized, "utf-8") blindly stomps the file. If the user is concurrently saving (their editor and the preview-server racing), the server can overwrite the user's just-saved content with a stale (pre-edit) version that's been hf-id-tagged.

Repro window: user types in their editor → editor save fires at T → preview server already read the file at T-50ms → server's persistHfIdsIfNeeded writes back the T-50ms snapshot at T+10ms. User's edit is lost.

#1292 ALSO fixes this — re-reads disk and only writes if disk content equals the input html:

const current = readFileSync(filePath, "utf-8");
if (current === html) {
  writeFileSync(filePath, normalized, "utf-8");
}

Same merge-coupling concern as the first blocker — should land in this PR (or with explicit MQ ordering guarantee). The TOCTOU guard is 4 more lines.

Concerns

[Bundle-vs-disk id divergence isn't tested] The preview route runs persistHfIdsIfNeeded on the source file, then calls adapter.bundle(project.dir). If adapter.bundle():

  • Reads disk freshly → it sees the just-written hf-ids → bundle output preserves them → serve-time ensureHfIds is idempotent → ids on the wire match ids on disk. ✅
  • Reads from a cached snapshot taken before the write → bundle output has NO ids → serve-time ensureHfIds mints NEW ids on the bundle output → ids on the wire DON'T match disk. ✗

The second case is plausible if the bundler does any caching (Vite/Rollup-style). When the studio drag-to-edit feature later sends a moveElement patch keyed by a wire-time id, the patch fails to apply because the disk file has different ids.

I don't know enough about adapter.bundle() internals to say which case fires. The two new preview.test.ts integration tests (L323-355) exercise the "no bundle, falls back to disk" path — the bundle-success path's id-stability is not tested.

Adding one test that injects a stub adapter.bundle() returning untagged HTML and verifies the served ids match the disk-write ids (or — more defensible — fails loudly when they would diverge) closes this. Otherwise it's a latent footgun for the R7 drag-to-edit feature.

[persistHfIdsIfNeeded is in studio-api/helpers but operates on disk] Architectural — the rest of studio-api/helpers/* is request-shape and bundling code, all in-memory string transforms. This is the only one that does fs.writeFileSync. A reader looking for "what touches the filesystem in studio-api" might miss it.

Two paths:

  • Keep it where it is and add a comment in the file header noting "this module performs disk writes — usage requires per-project mutex if concurrent serves are possible."
  • Move it to studio-api/disk/ or similar disk-side module location.

Not blocking; just worth picking before this convention takes root.

[Catch block in preview.ts re-uses normalizedDisk from before the try] L237-256:

const normalizedDisk = diskMain ? persistHfIdsIfNeeded(...) : null;
try {
  let bundled = await adapter.bundle(...);
  ...
} catch {
  if (diskMain && normalizedDisk !== null) {
    return c.html(
      injectStudioPreviewAugmentations(
        ensureHfIds(await transformPreviewHtml(normalizedDisk, ...)),
        ...
      ),
      ...
    );
  }
}

If adapter.bundle() failed because, say, the user is mid-save and the file is being rewritten by their editor, normalizedDisk is a SNAPSHOT from before the bundle attempt — potentially stale by the time we fall back. Serving stale content with hf-ids isn't catastrophic (the next request reads fresh), but the fallback should probably re-read disk.

#1292 fixes this too — the catch block in #1292 does resolveProjectMainHtml(project.dir, project.id) fresh and then re-runs persistHfIdsIfNeeded on the fresh content. Worth pulling in.

Nits

[Silent catch {} on writeFileSync failure] L17:

try {
  writeFileSync(filePath, normalized, "utf-8");
} catch {
  // non-fatal — serve with ids even if persist fails
}

The comment explains the WHY (non-fatal) but a console.warn or whatever logger is used in this code path would help debug "why isn't write-back happening for this user." Silent failure of disk writes is the kind of thing that surfaces as a confusing support ticket weeks later. Cheap to add a log line. (Or hand a logger adapter through the function signature if there's a project convention for that.)

[Bracket the bytes you compare] Both idsBefore and idsAfter regex matches in #1292 use \bdata-hf-id= — that's the right shape, no need to change. Mentioning for completeness — \b ensures we don't double-count something like before-data-hf-id=.

Questions

[Why persistHfIdsIfNeeded returns the normalized HTML, not the changed bool?] Current signature:

export function persistHfIdsIfNeeded(filePath: string, html: string): string

Caller (preview.ts) always uses the return value as the served HTML. Fine, but the function name implies "this persists conditionally" — a more descriptive name like normalizeAndPersist or tagAndServeFromDisk would tell callers "I serve you the normalized HTML even when I don't write back." Minor.

What I didn't verify

  • That adapter.bundle() reads disk freshly (rather than from a cached snapshot) — this determines whether the bundle-vs-disk id divergence Concern is real or theoretical. Worth a quick check of the bundler implementation.
  • That existing preview-regression and integration tests still pass with the new ensureHfIds-on-every-transform path. The "preview-regression" check in CI is currently green on this PR, so probably OK, but I didn't trace the test fixtures.
  • That data-hf-root elements correctly get hf-id minted by ensureHfIds (so the stage root, if it's tagged, gets a stable id). Probably true given the impl walks body.querySelectorAll("*"), but I didn't verify against an data-hf-root fixture.

Verdict

Right shape, blocked on two correctness gaps that #1292 already addresses. If the stack lands atomically through MQ, this is ready. If individual PRs can land alone, fold #1292's hfIdPersist.ts fixes into this PR — both are ~4-line patches. The bundle-vs-disk id divergence Concern is the only thing I'd add testing for either way.

Review by Rames D Jusso

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on Rames's review.

Rames identified two blockers I independently confirm: the string-equality change detection triggers spurious writes on linkedom reserialization, and there's no TOCTOU guard on concurrent user saves. Both are fixed in #1292. The question is merge-ordering.

Amplifying Rames's blocker: the spurious-write bug has a concrete user-visible symptom. If the user hand-authored any single-quoted attribute (<div class='wrapper'>) in a file served by the preview route, the server overwrites their formatting on every request — including replacing their single-quoting convention project-wide. Linkedom normalizes to double quotes. This isn't cosmetic: if the file is version-controlled and the user is committing, they'll see dirty diffs on every preview load. That's the kind of thing that erodes trust in the tool fast.

If the stack is guaranteed to land atomically through Graphite MQ: no action needed here.
If not: fold #1292's 8-line count-based detection + TOCTOU guard into this PR before it can land alone. Those two patches are isolated enough to cherry-pick cleanly.

Escalating Rames's "bundle-vs-disk id-stability" concern. The preview route calls persistHfIdsIfNeeded then adapter.bundle(). If the bundler's Vite/Rollup layer has any module cache (it almost certainly does for re-serves), it reads a snapshot from before the write and produces bundle output with no data-hf-id. Serve-time ensureHfIds then mints fresh ids on that un-tagged bundle HTML. Those freshly-minted ids won't match the disk-pinned ids. The first drag-to-edit gesture from studio keyed off the wire-time id will fail to find the element on disk. This is the most impactful latent failure in the stack — the drag-to-edit round-trip is R7's whole point. The two new integration tests in preview.test.ts test the "no-bundle fallback" path, not the bundle-success path with a warm bundler cache. Add one test for that path.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 — HEAD is unchanged at a067aa9 since my round-1 review. The fixes for the two correctness gaps I flagged (linkedom-reserialization spurious writes; concurrent-save TOCTOU race) continue to live only in #1292's 2af2228 fix(core): address R7 code-review findings (C1–C14, P6–P7) commit:

  • Count-based change detection (idsAfter > idsBefore).
  • Re-read-and-compare TOCTOU guard.
  • Plus the bundle-vs-disk id-stability regression test I had flagged as the untested architectural risk — added in #1292's d916901 at preview.test.ts:363-391. Pins the content-hash invariant by stubbing adapter.bundle() to return untagged source HTML and asserting servedIds === diskIds. Closes the latent footgun.

So the round-1 blockers on this PR are still merge-coupled to #1292. Stack must land atomically through Graphite MQ for prod to be safe — same posture as round 1.

(One CI nit: the preview-regression FAILURE showing in gh pr checks is from workflow run 27183049880 at 04:03:57 and was superseded by a SUCCESS run at 04:06:04. Same workflow, concurrency-cancellation hangover. Not a real failure.)

My round-1 review at this SHA still stands.

Review by Rames D Jusso

jrusso1020
jrusso1020 previously approved these changes Jun 9, 2026

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on behalf of James.

vanceingalls and others added 8 commits June 9, 2026 06:28
…review)

Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with
id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable,
exact-match) by design — targeting uses exact [data-hf-id="…"] match and does
not require the hf- shape; legacy values re-mint only at the R7 write-back. Not
a bug. Comment-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The original comment said legacy data-hf-id values "are re-minted only
once the R7 write-back persists freshly-minted ids to source" — which is
incorrect. ensureHfIds skips elements that already carry data-hf-id, so
legacy values (e.g. data-hf-id="my-title") persist indefinitely and are
NOT automatically re-minted. Exact-match targeting still works correctly.
Update comment to reflect actual behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eview)

Addresses Rames' review on #1271: execDataAttrPattern returned the first regex
match without checking for a second. A duplicate id/data-hf-id in source (id
drift) would silently patch one element and leave the other stale. Now warns
when more than one element matches. By the mint contract it should never fire.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…3 review)

Adds test: "hfId match is authoritative — selector is not used as a
narrowing filter". When hfId matches element A and selector points at
element B, findTagByTarget returns A without consulting selector as a
narrowing filter. Pins the intended behaviour so a future refactor
cannot silently start narrowing by selector.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Elements now get data-hf-id minted by ensureHfIds; parser reads
data-hf-id as model id, so HTML id attrs are no longer the model id.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vanceingalls and others added 3 commits June 9, 2026 06:29
Locks the preservation guarantee the write-back design depends on: a
Studio edit targeting by id or selector (it never sends hfId) must not strip
an existing data-hf-id, or the stable handle is destroyed by the next edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review)

Addresses review on #1272 (Miguel P3 + Rames): findTargetElement interpolated
target.hfId raw into a [data-hf-id="..."] selector. Escape it (CSS attr-value
injection guard) and warn when a hfId matches more than one element instead of
silently patching an arbitrary one. Adds an injection-guard test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the 06-08-test_core_previewadapter_contract_failing_tests_t10_spec_for_r7_ branch from 7d1bf83 to 4fceb69 Compare June 9, 2026 06:30
vanceingalls and others added 3 commits June 9, 2026 06:30
… for persistHfIdsIfNeeded

Prior tests only exercised normalizeHfIds (pure function) and the existing
pin guard in ensureHfIds — both pass on the parent commit without any Task 1
code. Replace with three tests that exercise the actual disk write-back:

- writes data-hf-id to disk when source is untagged
- does not rewrite disk when source is already tagged (idempotent)
- returned id matches id written to disk (serve-time == persist-time invariant)

These fail on the parent commit (persistHfIdsIfNeeded doesn't exist) and
green after Task 1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-back (R7, Task 1-2)

Two integration tests against the preview route (via Hono test harness):

- served HTML carries data-hf-id on body elements (>= 2 matches for div+p)
- disk file contains data-hf-id after first GET (write-back verified via readFileSync)

These fail on the parent commit (no hfIdPersist wiring in preview.ts) and
green after Task 1. Closes the verification gap flagged in review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the 06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_ branch from a067aa9 to 4b1a36d Compare June 9, 2026 06:31
Base automatically changed from 06-08-test_core_previewadapter_contract_failing_tests_t10_spec_for_r7_ to main June 9, 2026 07:28
@vanceingalls vanceingalls dismissed jrusso1020’s stale review June 9, 2026 07:28

The base branch was changed.

@vanceingalls vanceingalls merged commit 740a83a into main Jun 9, 2026
39 of 40 checks passed
@vanceingalls vanceingalls deleted the 06-08-feat_core_hf-id_write-back_to_disk_serve-time_surfacing_r7_task_1-2_ branch June 9, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants