Conversation
6a28160 to
7351d0f
Compare
Move heavy module-level imports in cli.py (console, env_utils, code_utils, config_parser, lsp.helpers, version) into the functions that actually use them. Split main.py imports so parse_args() is called before loading the full stack — --help exits via argparse before any heavy modules load. Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30): --help: 297ms → 39ms (7.7x faster) --version: 17ms (unchanged)
Restructure main() command dispatch so auth and compare exit early without loading telemetry (sentry, posthog), version_check, or the banner. Defer cmd_auth.py imports into functions. auth status: ~1000ms → 237ms (4.2x) compare --help: ~297ms → 38ms (7.9x)
uv-dynamic-versioning rewrites version.py on every `uv run`, so the ruff auto-format job was inadvertently committing dev version strings. Restore version.py files after formatting and revert the ones already changed on this branch.
a019fdc to
992e91a
Compare
Defer console, formatter, code_utils, registry, and lsp.helpers imports from module level into the functions that use them. Inline is_LSP_enabled (a one-liner env var check) to avoid importing lsp.helpers on the happy path of get_codeflash_api_key. auth status: 237ms → 160ms on Azure Standard_D4s_v5.
Move libcst, rich.tree.Tree, console, comparator, code_utils, registry, lsp.helpers, and LspMarkdownMessage from module-level to the methods that use them. Only pydantic and TestType remain at module level (needed for class definitions). models.py import: 633ms → 125ms on Azure Standard_D4s_v5.
Cache the visitor dispatch tables that libcst rebuilds on every MatcherDecoratableTransformer/Visitor instantiation. The tables depend only on the class, not the instance, so caching by type is safe. Saves ~27ms per visitor instantiation (24x faster). Also fix pre-existing ruff F821 in cli.py (missing exit_with_message import in process_pyproject_config).
Measures median wall-clock time for --version, --help, auth status,
and compare --help across 30 runs with 3 warmups.
Usage:
codeflash compare main codeflash/optimize \
--script "python benchmarks/bench_cli_startup.py" \
--script-output benchmarks/results.json
Contributor
⚡️ Codeflash found optimizations for this PR📄 201% (2.01x) speedup for
|
The test only needs project_root, not a full Optimizer (which requires an API key). Also adds missing __init__.py to tests/benchmarks/.
- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files - test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file
Imports in cmd_auth.py were moved into function bodies, so mock patches must target the source modules instead of cmd_auth's namespace.
Use O(1) frozenset membership test with type identity before falling through to isinstance MRO traversal. Backported from codeflash-python.
5 scenarios: primitives, nested dicts, DB rows, deep nesting, and identity types (frozenset/range/complex/Decimal/OrderedDict).
Move the 4 most common return-value types (str, list/tuple, dict) to `orig_type is T` identity checks at the top of the dispatch chain, before the frozenset lookup. A single pointer comparison is cheaper than a frozenset hash, and these types need special handling anyway (temp-path normalization, recursive comparison, superset support). Before: dict traversed ~8 isinstance checks before being handled. After: dict is handled at check #3 via `orig_type is dict`. The isinstance fallbacks remain as slow-paths for subclasses (deque, ChainMap, defaultdict, scipy dok_matrix, etc.). Backported from codeflash-python dispatch ordering.
Windows defaults to cp1252 which can't decode some source file bytes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Systematic startup and runtime optimization across the CLI and verification paths:
cli.py,main.py,env_utils.py,shell_utils.py,cmd_auth.py, andmodels.py— move Rich, libcst, pydantic, isort, tomlkit, sentry_sdk, and other heavy libraries from module level into the functions that use themauthandcomparecommands skip banner, telemetry, and version check entirelyis_LSP_enabled()(one-liner env var check) inlined to avoid importinglsp.helperson the happy pathMatcherDecoratableTransformer/Visitordispatch tables by class type (24x faster per instantiation)type(obj) in frozensetfor 16 common typesorig_type is str/list/tuple/dictbefore frozenset for the 4 most common return-value types (single pointer comparison, cheaper than hash lookup)version.pyplaceholders via uv-dynamic-versioningOptimizerinstantiation fromtest_benchmark_code_extract_code_context(was failing due to missing API key)Benchmarks
All benchmarks run on Azure Standard_D4s_v5, Python 3.13.
CLI startup —
codeflash compare --script(30 runs, median)mainoptimize--version--helpcompare --helpauth statusCLI startup — hyperfine (--warmup 3 --min-runs 30)
mainoptimize--version--helpcompare --helpauth statusComparator optimization —
codeflash compare(accbab4 vs fe39d40, --inject)Both optimizations combined (frozenset fast-path + type identity dispatch reordering):
comparator()speedupcomparator_nested_dict(API response w/ dicts)comparator_rows(200 tuples)comparator_deep(15-level nested dict)comparator_identity_types(120 items)comparator_primitives(20 flat)The dict fast-path was the biggest win: dicts were previously handled at check #8+ (after str, list, weakref, jax, xarray, tensorflow, sqlalchemy instanceof checks). Now handled at check #3 via
orig_type is dict.libcst cache —
codeflash compare(61053be vs b533f50, --inject)test_benchmark_extract(single file context)test_benchmark_full_pipeline(discover→extract→replace→merge)test_benchmark_extract_context_multi_file(10 files)test_benchmark_discover_functions_multi_file(10 files)test_benchmark_merge_test_resultstest_benchmark_code_to_optimize_test_discoverytest_benchmark_codeflash_test_discoveryThe cache shows 7-13% E2E improvement on libcst-heavy paths (extract, pipeline). The 24x microbenchmark speedup is diluted because dispatch table construction is a fraction of total work (jedi resolution, file I/O, AST parsing dominate).
Module import times
models.pyFiles changed
main.pycli_cmds/cli.pycli_cmds/cmd_auth.pycode_utils/env_utils.pycode_utils/shell_utils.pymodels/models.pycode_utils/_libcst_cache.pyverification/comparator.pyimport libcstimport codeflash.code_utils._libcst_cachetests/benchmarks/test_benchmark_code_extract_code_context.pyOptimizerdeptests/benchmarks/test_benchmark_libcst_multi_file.pytests/benchmarks/test_benchmark_libcst_pipeline.pytests/benchmarks/test_benchmark_comparator.pybenchmarks/bench_cli_startup.pycodeflash compare --script.github/workflows/ci.yamlTest plan
codeflash --version/--help/auth status/compare --helpall work correctlycheck_formatter_installed()andget_codeflash_api_key()work with deferred importsfrom codeflash.models.models import TestResults, CodeString, CoverageDataworkscodeflash compare --scriptE2E benchmark runs end-to-endcodeflash comparebenchmark tests all pass (12 benchmarks)