Showcase: Codeflash-agent by KRRT7 · Pull Request #2054 · codeflash-ai/codeflash

KRRT7 · 2026-04-10T03:32:23Z

Summary

Systematic startup and runtime optimization across the CLI and verification paths:

Defer heavy module-level imports in cli.py, main.py, env_utils.py, shell_utils.py, cmd_auth.py, and models.py — move Rich, libcst, pydantic, isort, tomlkit, sentry_sdk, and other heavy libraries from module level into the functions that use them
Early-exit command dispatch — auth and compare commands skip banner, telemetry, and version check entirely
Inline trivial helpers — is_LSP_enabled() (one-liner env var check) inlined to avoid importing lsp.helpers on the happy path
Backport libcst visitor dispatch cache from codeflash-python — caches MatcherDecoratableTransformer/Visitor dispatch tables by class type (24x faster per instantiation)
Comparator type dispatch optimization (backported from codeflash-python):
- Frozenset fast-path: O(1) type(obj) in frozenset for 16 common types
- Type identity fast-paths: orig_type is str/list/tuple/dict before frozenset for the 4 most common return-value types (single pointer comparison, cheaper than hash lookup)
- Dict fast-path eliminates ~8 isinstance checks (was handled at line 319, now at check compare stdout #3)
Fix ruff auto-format workflow accidentally rewriting version.py placeholders via uv-dynamic-versioning
Fix benchmark test — remove unnecessary Optimizer instantiation from test_benchmark_code_extract_code_context (was failing due to missing API key)
Add benchmarks — libcst visitor (multi-file + full-pipeline), comparator microbenchmark (5 scenarios), CLI startup

Benchmarks

All benchmarks run on Azure Standard_D4s_v5, Python 3.13.

CLI startup — `codeflash compare --script` (30 runs, median)

Command	`main`	`optimize`	Speedup
`--version`	0.05s	0.05s	1.00x
`--help`	0.34s	0.09s	3.89x
`compare --help`	0.34s	0.09s	3.90x
`auth status`	1.02s	0.16s	6.23x
Total	1.76s	0.39s	4.46x

CLI startup — hyperfine (--warmup 3 --min-runs 30)

Command	`main`	`optimize`	Speedup
`--version`	53ms	53ms	unchanged (already fast-pathed)
`--help`	342ms	73ms	4.7x
`compare --help`	326ms	73ms	4.5x
`auth status`	983ms	159ms	6.2x

Comparator optimization — `codeflash compare` (`accbab4` vs `fe39d40`, --inject)

Both optimizations combined (frozenset fast-path + type identity dispatch reordering):

Benchmark	E2E Speedup	`comparator()` speedup
`comparator_nested_dict` (API response w/ dicts)	1.60x	1.65x
`comparator_rows` (200 tuples)	1.44x	1.46x
`comparator_deep` (15-level nested dict)	1.40x	1.45x
`comparator_identity_types` (120 items)	1.10x	1.60x
`comparator_primitives` (20 flat)	1.08x	1.50x

The dict fast-path was the biggest win: dicts were previously handled at check #8+ (after str, list, weakref, jax, xarray, tensorflow, sqlalchemy instanceof checks). Now handled at check #3 via orig_type is dict.

libcst cache — `codeflash compare` (`61053be` vs `b533f50`, --inject)

Benchmark	Base	Head	Speedup
`test_benchmark_extract` (single file context)	1.6s	1.5s	1.13x
`test_benchmark_full_pipeline` (discover→extract→replace→merge)	1.3s	1.3s	1.08x
`test_benchmark_extract_context_multi_file` (10 files)	24.2s	22.6s	1.07x
`test_benchmark_discover_functions_multi_file` (10 files)	5.8s	5.8s	1.01x
`test_benchmark_merge_test_results`	29.7ms	28.0ms	1.06x
`test_benchmark_code_to_optimize_test_discovery`	588ms	586ms	1.00x
`test_benchmark_codeflash_test_discovery`	350ms	346ms	1.01x

The cache shows 7-13% E2E improvement on libcst-heavy paths (extract, pipeline). The 24x microbenchmark speedup is diluted because dispatch table construction is a fraction of total work (jedi resolution, file I/O, AST parsing dominate).

Module import times

Module	Before	After	Speedup
`models.py`	633ms	125ms	5.1x

Files changed

File	Change
`main.py`	Early-exit dispatch for auth/compare; split imports
`cli_cmds/cli.py`	Defer 7 heavy module-level imports into functions
`cli_cmds/cmd_auth.py`	Defer all imports into functions
`code_utils/env_utils.py`	Defer console, formatter, code_utils, registry, lsp.helpers
`code_utils/shell_utils.py`	Defer console.logger into functions
`models/models.py`	Defer libcst, Rich, comparator, code_utils, console
`code_utils/_libcst_cache.py`	New — monkeypatch libcst visitor dispatch
`verification/comparator.py`	Frozenset fast-path + type identity dispatch reordering
10 files with `import libcst`	Add `import codeflash.code_utils._libcst_cache`
`tests/benchmarks/test_benchmark_code_extract_code_context.py`	Remove unnecessary `Optimizer` dep
`tests/benchmarks/test_benchmark_libcst_multi_file.py`	New — multi-file visitor benchmark
`tests/benchmarks/test_benchmark_libcst_pipeline.py`	New — full pipeline benchmark
`tests/benchmarks/test_benchmark_comparator.py`	New — 5-scenario comparator microbenchmark
`benchmarks/bench_cli_startup.py`	New — CLI startup benchmark for `codeflash compare --script`
`.github/workflows/ci.yaml`	Fix ruff auto-format rewriting version.py

Test plan

codeflash --version / --help / auth status / compare --help all work correctly
check_formatter_installed() and get_codeflash_api_key() work with deferred imports
from codeflash.models.models import TestResults, CodeString, CoverageData works
libcst cache monkeypatch installs correctly and caches dispatch tables
Comparator: 160 unit tests pass, all 5 benchmark scenarios pass
ruff check passes clean
codeflash compare --script E2E benchmark runs end-to-end
codeflash compare benchmark tests all pass (12 benchmarks)
CI passes (full test suite)

Move heavy module-level imports in cli.py (console, env_utils, code_utils, config_parser, lsp.helpers, version) into the functions that actually use them. Split main.py imports so parse_args() is called before loading the full stack — --help exits via argparse before any heavy modules load. Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30): --help: 297ms → 39ms (7.7x faster) --version: 17ms (unchanged)

Restructure main() command dispatch so auth and compare exit early without loading telemetry (sentry, posthog), version_check, or the banner. Defer cmd_auth.py imports into functions. auth status: ~1000ms → 237ms (4.2x) compare --help: ~297ms → 38ms (7.9x)

uv-dynamic-versioning rewrites version.py on every `uv run`, so the ruff auto-format job was inadvertently committing dev version strings. Restore version.py files after formatting and revert the ones already changed on this branch.

Defer console, formatter, code_utils, registry, and lsp.helpers imports from module level into the functions that use them. Inline is_LSP_enabled (a one-liner env var check) to avoid importing lsp.helpers on the happy path of get_codeflash_api_key. auth status: 237ms → 160ms on Azure Standard_D4s_v5.

Move libcst, rich.tree.Tree, console, comparator, code_utils, registry, lsp.helpers, and LspMarkdownMessage from module-level to the methods that use them. Only pydantic and TestType remain at module level (needed for class definitions). models.py import: 633ms → 125ms on Azure Standard_D4s_v5.

Cache the visitor dispatch tables that libcst rebuilds on every MatcherDecoratableTransformer/Visitor instantiation. The tables depend only on the class, not the instance, so caching by type is safe. Saves ~27ms per visitor instantiation (24x faster). Also fix pre-existing ruff F821 in cli.py (missing exit_with_message import in process_pyproject_config).

Measures median wall-clock time for --version, --help, auth status, and compare --help across 30 runs with 3 warmups. Usage: codeflash compare main codeflash/optimize \ --script "python benchmarks/bench_cli_startup.py" \ --script-output benchmarks/results.json

codeflash-ai · 2026-04-10T04:59:48Z

⚡️ Codeflash found optimizations for this PR

📄 201% (2.01x) speedup for `handle_optimize_all_arg_parsing` in `codeflash/cli_cmds/cli.py`

⏱️ Runtime : 23.7 milliseconds → 7.87 milliseconds (best of 113 runs)

A new Optimization Review has been created.

🔗 Review here

The test only needs project_root, not a full Optimizer (which requires an API key). Also adds missing __init__.py to tests/benchmarks/.

- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files - test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file

Imports in cmd_auth.py were moved into function bodies, so mock patches must target the source modules instead of cmd_auth's namespace.

Use O(1) frozenset membership test with type identity before falling through to isinstance MRO traversal. Backported from codeflash-python.

5 scenarios: primitives, nested dicts, DB rows, deep nesting, and identity types (frozenset/range/complex/Decimal/OrderedDict).

Move the 4 most common return-value types (str, list/tuple, dict) to `orig_type is T` identity checks at the top of the dispatch chain, before the frozenset lookup. A single pointer comparison is cheaper than a frozenset hash, and these types need special handling anyway (temp-path normalization, recursive comparison, superset support). Before: dict traversed ~8 isinstance checks before being handled. After: dict is handled at check #3 via `orig_type is dict`. The isinstance fallbacks remain as slow-paths for subclasses (deque, ChainMap, defaultdict, scipy dok_matrix, etc.). Backported from codeflash-python dispatch ordering.

Windows defaults to cp1252 which can't decode some source file bytes.

KRRT7 changed the title ~~perf: 33x faster --version via import deferral + fast-path~~ perf: 33x faster --version via import deferral + fast-path (showcase) Apr 10, 2026

KRRT7 closed this Apr 10, 2026

KRRT7 force-pushed the codeflash/optimize branch from 6a28160 to 7351d0f Compare April 10, 2026 03:57

KRRT7 reopened this Apr 10, 2026

KRRT7 marked this pull request as draft April 10, 2026 04:08

github-actions bot and others added 3 commits April 10, 2026 04:09

style: auto-format with ruff

05a7641

style: auto-format with ruff

1e8e5d2

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Apr 10, 2026

KRRT7 force-pushed the codeflash/optimize branch from a019fdc to 992e91a Compare April 10, 2026 04:21

KRRT7 and others added 5 commits April 9, 2026 23:29

style: auto-format with ruff

88babfe

style: auto-format with ruff

61053be

KRRT7 changed the title ~~perf: 33x faster --version via import deferral + fast-path (showcase)~~ perf: 4-6x faster CLI startup via import deferral + libcst cache Apr 10, 2026

KRRT7 added 3 commits April 10, 2026 00:10

fix: remove unnecessary Optimizer from benchmark test

1a25f05

The test only needs project_root, not a full Optimizer (which requires an API key). Also adds missing __init__.py to tests/benchmarks/.

fix: update test_cmd_auth patches for deferred imports

accbab4

Imports in cmd_auth.py were moved into function bodies, so mock patches must target the source modules instead of cmd_auth's namespace.

KRRT7 changed the title ~~perf: 4-6x faster CLI startup via import deferral + libcst cache~~ Showcase: CLI startup & libcst cache optimization benchmarks Apr 10, 2026

KRRT7 added 4 commits April 10, 2026 00:53

perf: add frozenset fast-path for comparator type dispatch

4c3c6ea

Use O(1) frozenset membership test with type identity before falling through to isinstance MRO traversal. Backported from codeflash-python.

bench: add dedicated comparator microbenchmark for frozenset fast-path

5a5b6e4

5 scenarios: primitives, nested dicts, DB rows, deep nesting, and identity types (frozenset/range/complex/Decimal/OrderedDict).

fix: specify utf-8 encoding in benchmark read_text for Windows CI

381d131

Windows defaults to cp1252 which can't decode some source file bytes.

KRRT7 merged commit 72a41a5 into main Apr 10, 2026
53 of 61 checks passed

KRRT7 deleted the codeflash/optimize branch April 10, 2026 07:00

KRRT7 changed the title ~~Showcase: CLI startup & libcst cache optimization benchmarks~~ Showcase: Codeflash-agent Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Showcase: Codeflash-agent#2054

Showcase: Codeflash-agent#2054
KRRT7 merged 18 commits intomainfrom
codeflash/optimize

KRRT7 commented Apr 10, 2026 •

edited

Loading

Uh oh!

codeflash-ai bot commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

CLI startup — codeflash compare --script (30 runs, median)

CLI startup — hyperfine (--warmup 3 --min-runs 30)

Comparator optimization — codeflash compare (accbab4 vs fe39d40, --inject)

libcst cache — codeflash compare (61053be vs b533f50, --inject)

Module import times

Files changed

Test plan

Uh oh!

codeflash-ai bot commented Apr 10, 2026

⚡️ Codeflash found optimizations for this PR

📄 201% (2.01x) speedup for handle_optimize_all_arg_parsing in codeflash/cli_cmds/cli.py

A new Optimization Review has been created.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KRRT7 commented Apr 10, 2026 •

edited

Loading

CLI startup — `codeflash compare --script` (30 runs, median)

Comparator optimization — `codeflash compare` (`accbab4` vs `fe39d40`, --inject)

libcst cache — `codeflash compare` (`61053be` vs `b533f50`, --inject)

📄 201% (2.01x) speedup for `handle_optimize_all_arg_parsing` in `codeflash/cli_cmds/cli.py`