Skip to content

Showcase: Codeflash-agent#2054

Merged
KRRT7 merged 18 commits intomainfrom
codeflash/optimize
Apr 10, 2026
Merged

Showcase: Codeflash-agent#2054
KRRT7 merged 18 commits intomainfrom
codeflash/optimize

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Apr 10, 2026

Summary

Systematic startup and runtime optimization across the CLI and verification paths:

  1. Defer heavy module-level imports in cli.py, main.py, env_utils.py, shell_utils.py, cmd_auth.py, and models.py — move Rich, libcst, pydantic, isort, tomlkit, sentry_sdk, and other heavy libraries from module level into the functions that use them
  2. Early-exit command dispatchauth and compare commands skip banner, telemetry, and version check entirely
  3. Inline trivial helpersis_LSP_enabled() (one-liner env var check) inlined to avoid importing lsp.helpers on the happy path
  4. Backport libcst visitor dispatch cache from codeflash-python — caches MatcherDecoratableTransformer/Visitor dispatch tables by class type (24x faster per instantiation)
  5. Comparator type dispatch optimization (backported from codeflash-python):
    • Frozenset fast-path: O(1) type(obj) in frozenset for 16 common types
    • Type identity fast-paths: orig_type is str/list/tuple/dict before frozenset for the 4 most common return-value types (single pointer comparison, cheaper than hash lookup)
    • Dict fast-path eliminates ~8 isinstance checks (was handled at line 319, now at check compare stdout #3)
  6. Fix ruff auto-format workflow accidentally rewriting version.py placeholders via uv-dynamic-versioning
  7. Fix benchmark test — remove unnecessary Optimizer instantiation from test_benchmark_code_extract_code_context (was failing due to missing API key)
  8. Add benchmarks — libcst visitor (multi-file + full-pipeline), comparator microbenchmark (5 scenarios), CLI startup

Benchmarks

All benchmarks run on Azure Standard_D4s_v5, Python 3.13.

CLI startup — codeflash compare --script (30 runs, median)

Command main optimize Speedup
--version 0.05s 0.05s 1.00x
--help 0.34s 0.09s 3.89x
compare --help 0.34s 0.09s 3.90x
auth status 1.02s 0.16s 6.23x
Total 1.76s 0.39s 4.46x

CLI startup — hyperfine (--warmup 3 --min-runs 30)

Command main optimize Speedup
--version 53ms 53ms unchanged (already fast-pathed)
--help 342ms 73ms 4.7x
compare --help 326ms 73ms 4.5x
auth status 983ms 159ms 6.2x

Comparator optimization — codeflash compare (accbab4 vs fe39d40, --inject)

Both optimizations combined (frozenset fast-path + type identity dispatch reordering):

Benchmark E2E Speedup comparator() speedup
comparator_nested_dict (API response w/ dicts) 1.60x 1.65x
comparator_rows (200 tuples) 1.44x 1.46x
comparator_deep (15-level nested dict) 1.40x 1.45x
comparator_identity_types (120 items) 1.10x 1.60x
comparator_primitives (20 flat) 1.08x 1.50x

The dict fast-path was the biggest win: dicts were previously handled at check #8+ (after str, list, weakref, jax, xarray, tensorflow, sqlalchemy instanceof checks). Now handled at check #3 via orig_type is dict.

libcst cache — codeflash compare (61053be vs b533f50, --inject)

Benchmark Base Head Speedup
test_benchmark_extract (single file context) 1.6s 1.5s 1.13x
test_benchmark_full_pipeline (discover→extract→replace→merge) 1.3s 1.3s 1.08x
test_benchmark_extract_context_multi_file (10 files) 24.2s 22.6s 1.07x
test_benchmark_discover_functions_multi_file (10 files) 5.8s 5.8s 1.01x
test_benchmark_merge_test_results 29.7ms 28.0ms 1.06x
test_benchmark_code_to_optimize_test_discovery 588ms 586ms 1.00x
test_benchmark_codeflash_test_discovery 350ms 346ms 1.01x

The cache shows 7-13% E2E improvement on libcst-heavy paths (extract, pipeline). The 24x microbenchmark speedup is diluted because dispatch table construction is a fraction of total work (jedi resolution, file I/O, AST parsing dominate).

Module import times

Module Before After Speedup
models.py 633ms 125ms 5.1x

Files changed

File Change
main.py Early-exit dispatch for auth/compare; split imports
cli_cmds/cli.py Defer 7 heavy module-level imports into functions
cli_cmds/cmd_auth.py Defer all imports into functions
code_utils/env_utils.py Defer console, formatter, code_utils, registry, lsp.helpers
code_utils/shell_utils.py Defer console.logger into functions
models/models.py Defer libcst, Rich, comparator, code_utils, console
code_utils/_libcst_cache.py New — monkeypatch libcst visitor dispatch
verification/comparator.py Frozenset fast-path + type identity dispatch reordering
10 files with import libcst Add import codeflash.code_utils._libcst_cache
tests/benchmarks/test_benchmark_code_extract_code_context.py Remove unnecessary Optimizer dep
tests/benchmarks/test_benchmark_libcst_multi_file.py New — multi-file visitor benchmark
tests/benchmarks/test_benchmark_libcst_pipeline.py New — full pipeline benchmark
tests/benchmarks/test_benchmark_comparator.py New — 5-scenario comparator microbenchmark
benchmarks/bench_cli_startup.py New — CLI startup benchmark for codeflash compare --script
.github/workflows/ci.yaml Fix ruff auto-format rewriting version.py

Test plan

  • codeflash --version / --help / auth status / compare --help all work correctly
  • check_formatter_installed() and get_codeflash_api_key() work with deferred imports
  • from codeflash.models.models import TestResults, CodeString, CoverageData works
  • libcst cache monkeypatch installs correctly and caches dispatch tables
  • Comparator: 160 unit tests pass, all 5 benchmark scenarios pass
  • ruff check passes clean
  • codeflash compare --script E2E benchmark runs end-to-end
  • codeflash compare benchmark tests all pass (12 benchmarks)
  • CI passes (full test suite)

@KRRT7 KRRT7 changed the title perf: 33x faster --version via import deferral + fast-path perf: 33x faster --version via import deferral + fast-path (showcase) Apr 10, 2026
@KRRT7 KRRT7 closed this Apr 10, 2026
@KRRT7 KRRT7 force-pushed the codeflash/optimize branch from 6a28160 to 7351d0f Compare April 10, 2026 03:57
Move heavy module-level imports in cli.py (console, env_utils,
code_utils, config_parser, lsp.helpers, version) into the functions
that actually use them. Split main.py imports so parse_args() is
called before loading the full stack — --help exits via argparse
before any heavy modules load.

Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30):
  --help: 297ms → 39ms (7.7x faster)
  --version: 17ms (unchanged)
@KRRT7 KRRT7 reopened this Apr 10, 2026
@KRRT7 KRRT7 marked this pull request as draft April 10, 2026 04:08
github-actions bot and others added 3 commits April 10, 2026 04:09
Restructure main() command dispatch so auth and compare exit early
without loading telemetry (sentry, posthog), version_check, or the
banner. Defer cmd_auth.py imports into functions.

auth status: ~1000ms → 237ms (4.2x)
compare --help: ~297ms → 38ms (7.9x)
@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Apr 10, 2026
uv-dynamic-versioning rewrites version.py on every `uv run`, so the
ruff auto-format job was inadvertently committing dev version strings.
Restore version.py files after formatting and revert the ones already
changed on this branch.
@KRRT7 KRRT7 force-pushed the codeflash/optimize branch from a019fdc to 992e91a Compare April 10, 2026 04:21
KRRT7 and others added 5 commits April 9, 2026 23:29
Defer console, formatter, code_utils, registry, and lsp.helpers imports
from module level into the functions that use them. Inline is_LSP_enabled
(a one-liner env var check) to avoid importing lsp.helpers on the happy
path of get_codeflash_api_key.

auth status: 237ms → 160ms on Azure Standard_D4s_v5.
Move libcst, rich.tree.Tree, console, comparator, code_utils, registry,
lsp.helpers, and LspMarkdownMessage from module-level to the methods that
use them. Only pydantic and TestType remain at module level (needed for
class definitions).

models.py import: 633ms → 125ms on Azure Standard_D4s_v5.
Cache the visitor dispatch tables that libcst rebuilds on every
MatcherDecoratableTransformer/Visitor instantiation. The tables
depend only on the class, not the instance, so caching by type is
safe. Saves ~27ms per visitor instantiation (24x faster).

Also fix pre-existing ruff F821 in cli.py (missing exit_with_message
import in process_pyproject_config).
@KRRT7 KRRT7 changed the title perf: 33x faster --version via import deferral + fast-path (showcase) perf: 4-6x faster CLI startup via import deferral + libcst cache Apr 10, 2026
Measures median wall-clock time for --version, --help, auth status,
and compare --help across 30 runs with 3 warmups.

Usage:
  codeflash compare main codeflash/optimize \
    --script "python benchmarks/bench_cli_startup.py" \
    --script-output benchmarks/results.json
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Apr 10, 2026

⚡️ Codeflash found optimizations for this PR

📄 201% (2.01x) speedup for handle_optimize_all_arg_parsing in codeflash/cli_cmds/cli.py

⏱️ Runtime : 23.7 milliseconds 7.87 milliseconds (best of 113 runs)

A new Optimization Review has been created.

🔗 Review here

Static Badge

KRRT7 added 3 commits April 10, 2026 00:10
The test only needs project_root, not a full Optimizer (which requires
an API key). Also adds missing __init__.py to tests/benchmarks/.
- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files
- test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file
Imports in cmd_auth.py were moved into function bodies, so mock
patches must target the source modules instead of cmd_auth's namespace.
@KRRT7 KRRT7 changed the title perf: 4-6x faster CLI startup via import deferral + libcst cache Showcase: CLI startup & libcst cache optimization benchmarks Apr 10, 2026
KRRT7 added 4 commits April 10, 2026 00:53
Use O(1) frozenset membership test with type identity before falling
through to isinstance MRO traversal. Backported from codeflash-python.
5 scenarios: primitives, nested dicts, DB rows, deep nesting,
and identity types (frozenset/range/complex/Decimal/OrderedDict).
Move the 4 most common return-value types (str, list/tuple, dict) to
`orig_type is T` identity checks at the top of the dispatch chain,
before the frozenset lookup.  A single pointer comparison is cheaper
than a frozenset hash, and these types need special handling anyway
(temp-path normalization, recursive comparison, superset support).

Before: dict traversed ~8 isinstance checks before being handled.
After:  dict is handled at check #3 via `orig_type is dict`.

The isinstance fallbacks remain as slow-paths for subclasses (deque,
ChainMap, defaultdict, scipy dok_matrix, etc.).

Backported from codeflash-python dispatch ordering.
Windows defaults to cp1252 which can't decode some source file bytes.
@KRRT7 KRRT7 merged commit 72a41a5 into main Apr 10, 2026
53 of 61 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize branch April 10, 2026 07:00
@KRRT7 KRRT7 changed the title Showcase: CLI startup & libcst cache optimization benchmarks Showcase: Codeflash-agent Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant