feat(bench): bench robustness — TTL fixes, interleaved HOT rounds, path-set parity checks#369
Merged
Merged
Conversation
Covers prerequisites, the full Stage 0-5 guided/auto/dry-run/preflight modes, bundle layout, resume, crash recovery (restore + verify), stage and drive selection flags, publishing steps, and a troubleshooting section. Wires into docs/benchmarks/README.md.
… is current - Replace 'build from source' prereq with the actual default_binary() cascade (USERPROFILE\bin → target/release → PATH), matching all validation scripts. Add --bin flag override note. - Confirm Everything CLI 1.1.0.30 is still the latest (voidtools.com downloads page checked 2026-06-07); no pin change needed. - Add --bin to flags reference table.
…CLI 1.1.0.30 es.exe is a thin IPC wrapper; the GUI daemon version determines search behaviour and is part of the benchmark environment. - competitors.toml: add engine_version = "1.4.1.1032" field + comment explaining the CLI/engine split; note latest GUI download URL - runbook.md: prereq step 4 now says install GUI engine first, then fetch-competitors for the ES CLI - methodology.md: binary versions disclosure updated to list both GUI engine and CLI versions with explanation
es.exe per-drive scoping uses the bare drive letter and colon (e.g. C:) with no trailing backslash, consistent with everything_capacity_probe.rs L1+ convention. The backslash form worked incidentally but deviated from the established pattern in the scripts.
…fail-fast on missing tools - ToolVersion gains an exe field; render_md shows it as inline code alongside the version string so operators know exactly which binary was used - ToolProbe gains version_line_prefix: uffs_cpp uses 'UFFS version:' to extract the semantic version line from the multi-line banner instead of the first (URL) line - probe_tool: use map_or_else + bool::then idioms; rename single-char params - check_es_available probes tasklist/pgrep to distinguish DaemonNotRunning (process absent) from DaemonStarting (process running but IPC not ready); solo_reason in matrix carries matching user-facing messages - Orchestrator::capture fails fast with BenchError::MissingTools listing every tool whose version probe returned unknown so the run aborts before any measurements instead of silently degrading to UFFS-only - run.rs split into run/mod.rs + run/tests.rs to stay under 800 LOC policy - Tests: new preflight daemon_not_running/daemon_starting tests, probe_tool_version_line_prefix golden test, split dry_run_host/autopilot_host fixtures with documented call-order slots
resolve::es_exe: replace run-based PATH probe with where.exe/which lookup. The old probe ran `es.exe -get-everything-version` which exits 0 even when the Everything daemon is not running, so is_ok() was always true and the function always returned bare "es.exe" instead of the full path. The new probe calls where.exe (Windows) / which (Unix) to get the resolved absolute path from the OS PATH, with ~/bin and Program Files as fallbacks. tool_probe for everything: switch version arg from -get-everything-version to -version. The -version flag prints the ES CLI version (e.g. 1.1.0.30) without requiring the Everything daemon to be running, so the env fingerprint always shows a real version string regardless of daemon state. ToolProbe gains daemon_error_markers: if the combined output contains any marker substring, probe_tool returns "not running" rather than raw error text — guards against any remaining IPC-error bleed-through. Remove stale crates/uffs-bench/src/run.rs (untracked leftover after the run.rs -> run/mod.rs rename; was causing E0761 in doc tests). Tests: add probe_tool_daemon_error_markers_returns_not_running golden test; update dry_run_host/autopilot_host to reflect where.exe slot + -version output.
Each tool in the env fingerprint now carries a state field alongside its
version. State is determined by a lightweight StateProbe that runs a
process-presence command:
uffs — uffs daemon status, running_marker="running"
uffs_cpp — no daemon, state = n/a
everything — tasklist /FI "IMAGENAME eq Everything.exe" CSV, checks
for "Everything.exe" in output
everything_gui — same tasklist probe for state, but version comes from
es.exe -get-everything-version (IPC) so the version
reflects the running daemon build. daemon_error_markers
turns IPC-error output into "not running" rather than
raw error text.
New resolve::everything_exe uses where.exe/which first (full PATH hit),
then %ProgramFiles(x86)%/ProgramFiles known locations, then bare fallback.
ToolProbe gains display_exe: Option<String> so the version probe binary
(es.exe) can differ from the path shown in the report (Everything.exe).
ToolVersion gains state: String; render_md now shows
- **tool:** version (state: running) `/path/to/exe`
DEFAULT_TOOLS grows to [uffs, uffs_cpp, everything, everything_gui].
EVERYTHING_GUI_TOOL constant added to matrix.rs.
All 65 tests pass; no clippy warnings.
uffs tool probe:
- version_line_prefix: Some("uffs ") strips prefix so "uffs 0.5.117"
becomes "0.5.117" in the fingerprint and report.
- state_probe removed: daemon (uttfd) is started/restarted by the bench
itself; pre-run state is irrelevant — reports n/a.
Missing-tool soft gate (Orchestrator::capture):
- When tools are not found (version = "unknown"), print a per-tool
install hint (URL/instruction) via env::tool_install_hint.
- If >= 2 tools remain, show a confirm gate: proceed with available
tools or abort and install first.
- Hard-fail only when < 2 tools are available.
- New cards::missing_tools_card builds the Card for the gate.
Tool-version table output:
- render_md now emits a padded GFM table instead of a bullet list.
Columns: Tool | Version | State | Path
Column widths computed from actual data so the table stays aligned
regardless of exe path or version string length.
All 65 tests pass.
render_md now shows only Tool | Version | Path. The state field on ToolVersion is preserved — it will be used downstream to decide how to proceed (e.g. whether the daemon needs starting before a run).
Instead of a separate warning block, missing tools now appear as ordinary rows in the Tool versions table: | uffs_cpp |⚠️ not found | https://...install-url... | Version cell:⚠️ not found Path cell: install URL / artifact location from tool_install_hint() Column widths are computed from the hint string length so the table stays aligned. No pre-amble host.out() noise before the confirm gate. tool_install_hint import removed from run/mod.rs (only used in env.rs).
- uffs_cpp: direct download of uffs.com v1.0.0 release artifact - everything (ES CLI): voidtools CLI downloads page - everything_gui (Everything.exe): voidtools installer page
dev profile (unoptimized) is not appropriate for the benchmark harness itself — it runs the timing loops and process orchestration. All bench-* recipes now use `cargo run --release`.
Move render_md() + matrix::render_md() from run_stage0 into capture() so the tool-version table and matrix are shown immediately after probing, before either the missing-tool gate or the Stage 0 plan gate fires. Previously the table was invisible when tools were missing because run_stage0 never ran — the missing-tool confirm fired first with no context shown to the operator.
- show_full: blank lines between sections, skip empty commands header, render resources one-per-line, expand prompt to spaced [y]/[a]/[b]/[q] - missing_tools_card: title now reads 'Benchmarking UFFS and ES — proceed or quit?'; resources list available tools with checkmarks; takes available &[&str] param so run/mod.rs passes real tool names
everything + everything_gui → 'Everything' (one product, two probes) uffs → 'UFFS', uffs_cpp → 'UFFS (C++ ref)' Card title/resources now read 'Benchmarking UFFS and Everything' instead of listing raw probe IDs. Deduplication via unique_product_names().
…ore any gate capture() now prints: env fingerprint → matrix → (optional missing-tool gate) → returns. run_stage0 no longer re-prints the matrix; it goes straight to the plan-gate card. Single, unambiguous output sequence.
Previously the gate only fired when tools were missing. Now it always fires so the operator confirms (or in future deselects) which products will be benchmarked — UFFS, Everything, etc — before any measurement. - Rename missing_tools_card → tool_selection_card(available, missing) - All-present case: 'Benchmark X and Y — confirm tool selection' - Missing case: 'Benchmark X — proceed or quit to install missing first?' - Card id changed to 'tool-selection'
- Prompt line now reads '[Enter/y] proceed' making the default obvious - After read_key returns a terminal decision, echo '[Enter] -> proceeding' / '[y] -> proceeding' / '[q] -> aborting' etc. so the user sees their choice reflected before the next output appears - Enter (\n/\r) already mapped to Proceed; just needed the UX hint
…ction - Add uffs_record_count to DrivePreflight (UFFS full-scan count per drive, always populated regardless of ES state) - Add es_ram_budget_bytes to PreflightSpec and MatrixSpec (default: 50% of system RAM, derived from new EnvFingerprint::ram_bytes field) - Add UFFS_BYTES_PER_RECORD = 100 constant (voidtools: ~100 MB / 1M files) - Replace simple everything_serves() capable-drive check with greedy RAM- budget selector: sort drives by uffs_record_count ascending, accumulate until es_ram_budget_bytes exceeded -- avoids OOM-ing Everything's index - Add render_drive_table() to preflight: GFM table showing Drive, UFFS records, Est. RAM, ES status, ES capable (checkmark = fits in budget) - Print drive inventory table between tool-selection gate and matrix - Update all tests for new struct fields and revised mock call sequences - fix: sort_unstable on char, alloc BTreeSet, integer-only fmt_ram, doc on probe_drive, rename single-char ident in fmt_count
…table - daemon_start_if_needed(): fired at top of capture() before env probe; checks 'uffs daemon status' for Status:Ready; if not ready, fires 'uffs daemon start' and returns immediately so index loads in parallel with env capture + tool-selection gate - ensure_daemon_ready(): called only right before preflight::capture where UFFS drive counts are needed; polls 'uffs daemon status' up to 3 min (90x2s); prints live status line each tick so operator sees progress; hard-fails with BenchError::Command if never reaches Status:Ready - Extract both helpers + constants to run/daemon.rs to keep mod.rs under the 800 LOC file-size policy limit - Update run/tests.rs mock sequences for the two new daemon status calls and the full preflight drive-probe chain (es availability + uffs count + es result-count per candidate drive)
…ve IPC - Replace uffs_drive_count() (ran 'uffs <D>:\ * --count' per drive) with parse_daemon_status_drives() which parses the 'uffs daemon status' stdout emitted in capture() once; drives absent from output get count=0 - daemon_status_output() helper runs 'uffs daemon status' once in capture() and distributes the BTreeMap<char,u64> to probe_drive() via parameter - Rename 'ES status' column to 'ES index' in render_drive_table() — the column reflects whether ES has the drive indexed, not daemon run-state - Add parse_daemon_status_drives_extracts_counts test covering the real status format with comma-separated counts and em-dash separators - Update all preflight and run/tests.rs mocks: replace per-drive count mocks with single daemon status mock (one fewer IPC call per drive)
Parked drives appear in 'uffs daemon status' without a record count: [Parked] G: — bloom + trie kept resident; body released warm_parked_drives() fires 'uffs daemon preload <DRIVE>' for each candidate drive absent from the initial warm-map, then re-reads 'uffs daemon status' once so the drive appears as [Warm] with a live record count. Preload failure is non-fatal (count stays 0). - Add warm_parked_drives() helper in preflight/mod.rs - capture() calls it between first status read and probe_drive loop; second status read only runs when at least one drive was absent - Add DAEMON_READY_STATUS C Warm line in run/tests.rs so the C test drive is already warm (no preload injected there) - Add parked_drive_is_preloaded_and_count_populated test - Update unconfigured_drive_probed_once_without_sleep: mock preload + second status; assert 5 Run calls instead of 3 - Extract preflight test module to preflight/tests.rs (mod.rs 559 LOC, tests.rs 295 LOC — both under 800 LOC policy limit)
…ives Adds parse_daemon_status_drives_handles_hot_and_parked_tiers using real daemon output with mixed tiers: - [Parked] drives (no record count line) → absent from map - [Hot] drives → parsed identically to [Warm] (same 'N records (live)' format) - [Warm] drives → baseline already covered, now asserted alongside Hot
ES OOMs empirically at ~1.3 GiB (C+D+E); C+D=998 MiB works fine. Replace ram_bytes/2 heuristic (~32 GiB on 64 GiB host, meaningless) with ES_RAM_BUDGET_BYTES = 1_073_741_824 (1 GiB conservative ceiling). Drive table changes: - Title: 'Drive inventory' -> 'ES RAM budget' - Drop 'ES capable' column (every drive is individually capable; the column was redundant with Fits budget) - Rename 'ES status' column var es_status -> es_index (already done in header; align the local variable name) - Add 'Fits budget' column: greedy smallest-first fill up to 1 GiB cap - Add blockquote footer: 'ES RAM budget: X used of 1 GiB cap (N drives)' Expected output for the 6-drive run: ### ES RAM budget | Drive | UFFS records | Est. RAM | ES index | Fits budget | | C | 3,409,074 | 325 MiB | not running | ✓ | | D | 7,066,034 | 673 MiB | not running | ✓ | | E | 2,929,741 | 279 MiB | not running | ✗ over budget | ... > ES RAM budget: 998 MiB used of 1024 MiB cap (2 drive(s) fit)
Smallest-first maximized drive count within budget but violated operator intent: running 'just bench-suite --drives C,D,E,F,M,S' should fill C first, then D (C+D=998 MiB ≤ 1 GiB ✓), then E tips over the cap. Both render_drive_table() and ram_budget_capable_drives() now iterate drives in the order the operator specified them (candidate order).
Candidate drives passed via --drives that the UFFS daemon has no record of (neither Warm/Hot/Parked nor recoverable via preload) were shown as rows with 0 records. This fabricates status for non-existent drives. After preload + second status re-read, filter candidates to only those present in uffs_counts. Unknown drives emit a WARNING and are skipped entirely — no table row, no ES probe, no budget allocation. Update test: rename unconfigured_drive_probed_once_without_sleep to drive_unknown_to_daemon_is_skipped and assert drives.is_empty() with 4 Run calls (no es-probe on a dropped drive).
…tate ES running state (loaded+hot) should not prevent a drive from being listed as capable. Whether Everything is currently running is an execution-time concern; the capacity decision belongs to the RAM budget alone. - Remove everything_serves() — its only caller was ram_budget_capable_drives - ram_budget_capable_drives: drop the everything_serves() filter; accept all candidate drives in order up to the RAM cap (budget=0 = unlimited) - compute_matrix: per-cell cross vs solo remains gated on es_feasible (the feasibility cell check), which correctly handles not-running ES - Update tests: rename and correct assertions to reflect new semantics * everything_only_serves_loaded_hot_drives -> ram_budget_gates_capable_drives_not_es_running_state * configured_but_indexing_drive_is_uffs_only -> indexing_drive_is_capable_but_cell_is_uffs_only
… 0-record drives preflight/mod.rs: - Also skip drives with 0 records in the UFFS daemon index (warn + skip), not just drives absent entirely. A 0-record drive is not usefully indexed. matrix.rs: - ram_budget_capable_drives and compute_matrix now filter to only drives confirmed in preflight.drives (candidate_drives drives that were absent or 0-record were already dropped during capture and must not reappear). - Rename single-char closure params (d, dp) to satisfy min_ident_chars lint. - Update tests: without_everything adds preflight records; budget test asserts F/M/S absent from preflight produce capable=[C,D,E] solo=[E].
The negotiated matrix now appears before the ES launch confirmation so the operator sees the full plan (capable drives, UFFS-only reasons) and can make an informed decision. The redundant post-launch matrix render is dropped — the plan gate that follows immediately shows the locked plan. Also: add ES_STARTUP_GRACE_MS (5s) before first poll, increase kill grace to 3s, log full spawn command, show per-drive counts in poll messages.
1. Skip preload for drives UFFS has never heard of (H, I) Add parse_daemon_known_drives() which captures all drive letters from bracketed tier lines ([Warm], [Hot], [Parked]) regardless of record count. warm_parked_drives now skips any candidate not in known_drives — completely unknown drives (H, I) no longer trigger a preload attempt or a spurious second daemon-status call. 2. ES startup/poll diagnostics Add ES_STARTUP_GRACE_MS (5 s) sleep before first IPC poll, increase kill grace to 3 s, log full spawn command, show per-drive counts in every poll message [C:N D:N G:N].
…cond pass Before ES launch gate: show only capable drives (not the UFFS-only cell list which was all 'ES not started/starting' noise). The full matrix with cross-tool cells is shown after second-pass preflight. Second-pass preflight: restrict candidate_drives to the drives that survived first-pass UFFS filtering so H/I and other unknown drives never generate warnings on the re-probe.
The operator already confirmed tool selection and the ES launch by the time PLAN was presented. Replace the confirm loop with a direct write: stage0 artifacts are written silently and the done panel is shown. Dry-run mode is preserved: skips the write and shows the noop result.
tool_selection_card now takes a step_total parameter. When Everything is available the tool-selection card shows step 1/2 and the ES launch card shows 2/2. When Everything is not installed both are absent and step_total is 1 (1/1).
Add Host::run_streaming — inherits parent stdout/stderr via Command::status() so the child's output flows to the operator in real time rather than being buffered until process exit. Stage 1 (cross-tool) and Stage 2 (parity) now use run_streaming so the operator sees benchmark progress during a multi-minute harness run. Stage 3 already emits progress via host.out() and is unchanged. MockHost records RunStreaming calls; SystemHost uses Command::status().
The cross-tool script has no everything-gui tool token — it only knows uffs, uffs-cpp, and everything (es.exe). everything_gui (Everything.exe) is the GUI process used for ES indexing, not a CLI benchmarking tool. harness_tool now returns Option<String> and returns None for everything_gui/everything-gui; cross_tool_invocation uses filter_map to silently exclude it from the --tools list passed to the harness.
…stance When the bench launches Everything.exe with -instance uffs-bench, the default IPC window is absent and every es.exe call returns 'Error 8: Everything IPC window not found'. cross-tool-benchmark.rs: add --es-instance <name> arg; prepend -instance <name> to all es.exe invocations in run_es when set. StageCfg: add es_instance_name field (None = system instance). stage_cfg(): populate from cap.es_ini_path — Some(INSTANCE_NAME) when the bench launched its own private Everything.exe. cross_tool_invocation(): pass --es-instance to the harness when set.
--skip-stages 1,2 skips those stages without prompting, emitting '-> STAGE N: ... skipped (--skip-stages)' for each. Useful during development to iterate quickly without waiting for multi-minute runs. Complements --only-stage and --from-stage; all three compose correctly.
s/S was already wired to Decision::Skip in interpret_key but was not shown in the prompt line, leaving operators unaware of the option.
…upported uffs.com returns nothing/errors for trailing-wildcard glob (win*). Use empty cpp_pattern as sentinel to skip C++ for that pattern cell. UFFS Rust vs Everything head-to-head is preserved; C++ prints SKIP. find_p50 already returns "SKIP" for absent rows so summary is correct.
Remove daemon_start_if_needed, daemon_is_ready, uffs_needs_restart — the daemon is always killed and restarted (not conditionally) so measurements are confined to the negotiated drive set regardless of what was loaded before. daemon.rs: drop unused functions; keep kill_and_restart_with_drives + ensure_daemon_ready only. run/mod.rs: remove early daemon_start_if_needed call from capture(); uffs_needs_restart flag is now always !capable_drives.is_empty(). run/tests.rs: update mock call sequences to match new flow (no early start; dry-run skips kill/restart via ProceedNoop; autopilot queues daemon kill + start --drive C + poll + second-pass preflight). run/bootstrap.rs: extract resolve_bundle_dir, tool_disposition, run_fetch_competitors, load_or_new_state to bring mod.rs under the 800-line policy limit (816 -> 745 LOC).
…ate instance When es.exe returns an IPC error (Error 8) but the Everything.exe process is in the tasklist, the instance is running — just not the default one es.exe knows how to address (e.g. our private bench instance). Reporting 'not running' in that case was misleading. env.rs: compute state_probe before version probe so the result is available without a second tasklist call; in the daemon_error_markers branch return 'ipc unavailable' when state == 'running', 'not running' only when the process is genuinely absent. env.rs tests: add ipc_error_with_process_running_reports_ipc_unavailable and ipc_error_with_process_absent_reports_not_running to pin both branches. run/tests.rs: update mock call order — state (tasklist) now fires before the version probe for both 'everything' and 'everything_gui'.
Step numbers were wrong in two ways:
1. tool_selection_card used preflight_steps = if es_available { 2 } else
{ 1 }, missing the UFFS restart step entirely — showed '1/2' when
there were actually 3 steps.
2. uffs_restart_card computed uffs_step_num as preflight_steps - es_step
which evaluated to 0 when both gates were active — showed '0/3'.
Fix: compute preflight_steps in capture() as:
1 + u32::from(!cli.drives.is_empty()) + u32::from(es_available)
using cli.drives as a conservative proxy for 'UFFS restart will fire'
(if no drives are specified, the matrix will be empty and the restart
gate is skipped anyway).
Hardcode uffs_step_num = 2 (tool selection is always step 1).
In execute(), compute total_steps from the actual post-matrix flags
cap.uffs_needs_restart and cap.es_needs_launch for precise ES launch
step numbering.
Before negotiation the daemon may have been restricted to a previous drive set, causing drives to be silently missing from the first preflight probe and producing a narrower matrix than the candidate set. daemon.rs: add kill_and_restart_all_drives() -- same pattern as kill_and_restart_with_drives but with no --drive args so uffsd self-discovers every available NTFS drive. mod.rs: at the start of capture(), if drives are specified: 1. kill_and_restart_all_drives (background warm-up) 2. run env probes + show tool-selection gate (parallel with load) 3. ensure_daemon_ready (blocks until index ready) 4. first preflight (now sees the full drive set) The existing gated step-2 restart (--drive <capable>) is unchanged. tests.rs: update both mock sequences to include the two new run calls (daemon kill + daemon start) before env probes, and the additional ensure_daemon_ready poll before first preflight.
parity_invocation() was passing cfg.drives (-Drives C,D,E,F,G,H,I,M,S) to the PowerShell harness instead of cfg.capable_drives (the negotiated subset all tools can serve, e.g. C,D,G). Also fix the Stage 2 plan card: the cache resource label and backup note were also referencing cfg.drives for the drop-cache path.
Replaces the PowerShell cold-parity-per-drive.ps1 with a self-contained rust-script that uses the same binary-resolution cascade as uffs-bench: 1. explicit --uffs-bin / --cpp-bin 2. %USERPROFILE%\bin\uffs.exe / uffs.com 3. target\release\uffs.exe (Rust only) 4. bare PATH fallback If uffs.exe is missing the script exits with an actionable error and the release download URL. If uffs.com is missing it prints a warning with the C++ download URL and continues with the Rust-only column. ES / Everything is explicitly out of scope: the C++ binary re-reads all MFTs on every invocation so it cannot sustain a stable IPC connection; the parity test is strictly uffs.exe (daemon HOT) vs uffs.com (MFT re-read). Features: --drives C,D,G drives to test (default: C,D) --rounds N rounds per drive per tool (default: 1) --purge-cache stop daemon + delete all cache files (true COLD) --skip-cpp skip the C++ reference column --dump-raw print raw stderr per invocation --sleep-ms N inter-round sleep (default: 1000 ms) --output-file <path> tee full output to a file Emits two markdown tables: per-drive parity wall-clock p50 and the Rust wall / daemon / CLI-overhead breakdown. No external crate dependencies (no chrono) — timestamp computed via a simple epoch decomposition.
Switch the Stage 2 parity invocation from: powershell -NoProfile -ExecutionPolicy Bypass -File scripts/windows/cold-parity-per-drive.ps1 -Drives C,D,G -Rounds N -OutputFile ... [-PurgeCacheFirst] to: rust-script scripts/windows/cold-parity-per-drive.rs --drives C,D,G --rounds N --output-file ... [--purge-cache] Update PARITY_SCRIPT constant to point at the .rs file, reuse RUST_SCRIPT_EXE (same launcher as Stage 1), and remove the now-dead POWERSHELL_EXE constant. Module doc comment updated accordingly.
run_native() and plan() (stage 3 branch) were both iterating cfg.drives (all CLI-provided drives: C,D,E,F,G,H,I,M,S) instead of cfg.capable_drives (the negotiated subset every tool can serve). Also fixes snapshot_cache() which was backing up — and registering restores for — cache files on the full drive list rather than only the capable ones. After this commit every reference to cfg.drives in stages.rs is gone; all three stages and their cache snapshot/restore now operate exclusively on cfg.capable_drives.
- Remove --hide-system / --hide-ads: those flags exist only to align row
counts with Everything (which skips system files and ADS). This bench
is uffs.exe vs uffs.com only — the full unfiltered MFT corpus is the
correct and honest baseline.
- Keep --profile: it adds 'daemon: N ms' on stderr powering Table 2's
wall/daemon/overhead breakdown. Explain in doc comment why it is kept
(non-default code path, negligible overhead at >100 ms query scale).
- Fix C++ version display: scan --version output for the line starting
with 'UFFS version:' (same prefix the bench suite uses), emit as
'uffs.com X.Y.Z' instead of the full multi-line banner.
- Fix uffs.exe version display: print first non-empty trimmed line only.
- Fix binary resolution source labels: use .display() instead of {:?}
so paths are shown as plain 'C:\Users\rnio\bin\uffs.exe' without
Rust Debug escaping.
- Fix table alignment: compute every column width from the actual data
(drive names, ms values, speedup strings, row counts) before rendering
headers and rows, so all columns are exactly wide enough and properly
right-aligned with no overflow or under-padding. Add ms_str() and
speedup_str() helpers used by both tables.
Phase 0a now unconditionally kills and restarts the daemon with the
requested --drives (or bare start to auto-discover all drives when none
are given). This guarantees a known COLD starting state regardless of
whatever daemon was running before. --purge-cache additionally deletes
the on-disk cache files before the restart, forcing a true MFT re-read.
Phase 0b now issues three warm-up passes ('uffs * --limit 1 --profile'):
pass 1 — cold load path, await_ready, MFT read (or cache hit)
pass 2 — first HOT query, JIT query state primed
pass 3 — fully primed, daemon in steady-state query mode
Only pass-1 wall + await_ready are reported; total_records is read once
after all three settle.
Summary warm-up recap: 'Mode: WARM' is replaced with the correct
'COLD (daemon restarted ...)' label with a note on whether the
on-disk cache was purged or retained.
warmup_daemon_primed now returns WarmupResult carrying all three pass wall times, per-pass daemon ms (from --profile), await_ready from pass 1, and total_records. Phase 0b and the summary recap both print the full trajectory: Pass 1 [COLD] wall = X.XX s daemon = N ms (daemon load + MFT read / cache hit) Pass 2 [WARM] wall = X.XX s daemon = N ms (first HOT query — JIT structures built) Pass 3 [HOT ] wall = X.XX s daemon = N ms (fully primed — steady-state latency) This makes the COLD/WARM/HOT cost difference visible at a glance and shows the progression from a cold-started daemon through full priming.
DEFAULT_PATTERNS was emitting 'uffs.exe C:\ *.dll --count' (positional
path arg) instead of the correct CLI form:
uffs.exe "*.dll" --drives C --count
Fix: drop the '{DRIVE}:\' positional arg, use '--drives {DRIVE}' flag.
This fixes:
- Stage 3 plan card (commands shown verbatim were wrong)
- Stage 3 run_native (cells were failing with exit 1)
- preflight estimate_rows (row-count estimation was also broken)
Also update the spec_for test fixture in preflight/tests.rs to match.
Production defaults (policy.rs): HOT_TO_WARM: 60 s -> 600 s (10 min) WARM_TO_PARKED: 300 s -> 1800 s (30 min) PARKED_TO_COLD: 86400 s (24 h, unchanged) The previous 1 min / 5 min defaults caused spurious demotes mid-bench (HOT queries demote after 1 min idle between sink rotations; WARM restarts between Stage 2 and Stage 3 demote to Parked after 5 min). Bench scripts (cold-parity-per-drive.rs, cross-tool-benchmark.rs): Both now start the daemon with bench-safe TTL overrides: UFFS_HOT_TO_WARM_IDLE_SECS=3600 (1 hr) UFFS_WARM_TO_PARKED_IDLE_SECS=7200 (2 hr) Scoped to the daemon child process only; teardown's next start gets production defaults. cross-tool-benchmark.rs adds uffs_start() helper and wires it at: - skip_cold warm-up : kill+restart before HOT - WARM per-round : stop+start instead of stop-only - HOT per-drive : start before probe query Update config.rs const-pin and lib.rs env-var doc table.
… superset check
Previously all N rounds of UFFS ran first, then all N rounds of C++,
then all N of Everything. The later tools consistently benefited from
OS filesystem cache warmed by the earlier tool — biasing timings.
New HOT loop structure for each (sink, pattern):
for round in 1..=N:
shuffle [uffs, cpp, es] with a fresh LCG seed each round
run the three tools in that order
immediately check: uffs.exe rows >= uffs.com rows >= es.exe rows
print per-round row counts + superset verdict (ok / VIOLATION)
print per-tool p50/p95 summary
Superset semantics:
uffs.exe >= uffs.com — both read same MFT; with --hide-system
counts should match; violation = parity bug
uffs.exe >= es.exe — Everything skips NTFS system files / ADS
uffs.com >= es.exe — same reasoning for C++ tool
Violations are warnings only (bench continues) so timing data is
always preserved. The es.exe fast-fail short-circuit is preserved
(checked on round 0; remaining es rounds skipped on fast failure).
Also adds lcg_shuffle3() — Fisher-Yates on 3 elements using a
minimal LCG (glibc constants), zero external dependencies.
Row counts can match by coincidence (different files, same count). Replace the per-round count comparison with an actual path-set check: es.exe paths ⊆ uffs.com paths ⊆ uffs.exe paths Implementation: - Each run_* function gains a run_*_to variant accepting an explicit output file path so all three tools write to separate files (bench_uffs_<pat>.csv, bench_cpp_<pat>.csv, bench_es_<pat>.csv) within the same round without clobbering each other. - After every round (File sink): read each file into a normalised HashSet<String> of lowercased, header-stripped paths; call check_subset() for each pair; print violations with first 3 missing paths as examples. - Stdout/Null sinks: no output file retained, skip path check (row counts still shown). - Dead code removed: run_es and run_uffs_cpp wrappers, extract_paths_from_bytes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch of bench-harness improvements accumulated on
feat/bench-preflight-robustness.Daemon TTL defaults (
fix(bench))HOT_TO_WARM_IDLE_SECS: 60 s → 600 s (10 min)WARM_TO_PARKED_IDLE_SECS: 300 s → 1 800 s (30 min)cold-parity-per-drive.rs,cross-tool-benchmark.rs) now start the daemon with bench-safe overrides scoped to the daemon child process:UFFS_HOT_TO_WARM_IDLE_SECS=3600UFFS_WARM_TO_PARKED_IDLE_SECS=7200config.rsconst-pin andlib.rsenv-var doc table updated.Interleaved HOT rounds with random tool order (
feat(bench))(sink, pattern), every round shuffles[uffs, cpp, es]with a fresh LCG seed, runs them in that order, then reports.Path-set superset check per round (
fix(bench))bench_uffs_<pat>.csvetc.) so all three coexist within a round.Stage 3 / preflight CLI arg order fix (
fix(bench))DEFAULT_PATTERNSand preflight test fixture corrected to match actualuffs.exeCLI: pattern first, then--drives DRIVE, then--count.Cold-parity and warm-up polish
cold-parity-per-drive.rsalways restarts daemon with 3-pass warm-up.Testing
All CI gates pass: fmt, typos, reuse, lint-ci, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, lint-ci-windows.