tests: replace legacy e2e with flash-based mock-worker provisioning by deanq · Pull Request #479 · runpod/runpod-python

deanq · 2026-03-14T01:39:21Z

Summary

Replace the two-job e2e workflow (Docker build + runpod-test-runner JS action) with a single-job pytest workflow that uses Flash to provision mock-worker endpoints directly.

Before: CI clones mock-worker repo, patches requirements.txt with the PR SHA, builds and pushes a custom Docker image, then a JS-based test-runner action creates templates/endpoints via GraphQL, sends jobs, polls results, and cleans up.

After: pytest reads tests.json, Flash provisions runpod/mock-worker:latest endpoints with PodTemplate(dockerArgs=...) that pip-installs the PR's runpod-python at container start. No Docker build step. No JS test-runner. Tests submit jobs via Flash's async client, assert outputs, and undeploy endpoints in session teardown.

What this replaces

e2e-build job: Docker clone, patch, build, push (QEMU, Buildx, DockerHub login)
test job: runpod/[email protected] JS action

Test coverage

4 test cases from tests.json (same scenarios as the original):

Basic handler return
Delayed job (10s)
Generator handler (streaming)
Async generator handler (streaming)

Plus 1 cold start benchmark test.

Notable side-effects

test_missing_api_key updated to reflect existing behavior: Endpoint() now validates the API key at construction time (not at .run() time). This documents an earlier SDK change, not a new one in this PR.
Import threshold doubled (1000ms -> 2000ms): GitHub Actions shared runners show 800-1400ms measured variance. Documented p99 justification in code.

Not in scope

Python version support unchanged. requires-python >= 3.8 and CI matrix [3.8, 3.9, 3.10, 3.11] are preserved. Narrowing to 3.10+ should be a separate PR with a deprecation notice.

Test plan

5/5 tests pass in CI (686s)
Endpoints auto-cleaned up after session via undeploy()

Smoke testing revealed several issues: - flash run QB routes dispatch remotely via @endpoint decorator, requiring RUNPOD_API_KEY for all tests (not just LB) - Request body format must match handler param names (input_data, not prompt) - Health check must catch ConnectTimeout in addition to ConnectError - lb_endpoint.py startScript had wrong handler path (/src/ vs /app/) - asyncio_runner.Job requires (endpoint_id, job_id, session, headers), replaced with asyncio_runner.Endpoint - Cold start test uses dedicated port (8199) to avoid conflicts - CI-e2e.yml now requires RUNPOD_API_KEY secret and has 15min timeout - HTTP client timeout increased to 120s for remote dispatch latency

- Remove unused import runpod from test_lb_dispatch.py - Narrow bare Exception catch to (TypeError, ValueError, RuntimeError) - Extract _wait_for_ready to conftest as wait_for_ready with poll_interval param - Replace assert with pytest.fail in verify_local_runpod fixture - Move _patch_runpod_base_url to conftest as autouse fixture (DRY) - Add named constants for ports and timeouts - Add status assertions on set calls in test_state_independent_keys

tests/e2e/conftest.py

The previous install order (flash first, then editable with --no-deps) left aiohttp and other transitive deps missing because the editable build produced a different version identifier. Fix: install editable with full deps first, then flash, then re-overlay the editable.

Copilot

Pull request overview

This PR replaces the legacy/opaque end-to-end CI test setup with a flash-based e2e suite that spins up a flash run dev server and validates real SDK behaviors (QB on PRs, LB on nightly).

Changes:

Added new tests/e2e/ suite with fixtures and async pytest infrastructure for flash-based testing.
Added a purpose-built flash fixture project under tests/e2e/fixtures/all_in_one/ (QB + LB endpoints).
Replaced CI-e2e.yml and added CI-e2e-nightly.yml; registered new pytest markers.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`tests/e2e/conftest.py`	Adds `flash run` lifecycle fixture, HTTP client fixture, and API-key gating/SDK base URL patching.
`tests/e2e/test_worker_handlers.py`	QB handler execution assertions via HTTP against `flash run`.
`tests/e2e/test_worker_state.py`	QB state persistence checks across calls.
`tests/e2e/test_endpoint_client.py`	Exercises sync SDK `Endpoint` client against the flash server.
`tests/e2e/test_async_endpoint.py`	Exercises async SDK client + sync fallback against the flash server.
`tests/e2e/test_lb_dispatch.py`	LB remote dispatch tests (API-key gated).
`tests/e2e/test_cold_start.py`	Cold-start benchmark by launching a separate `flash run`.
`tests/e2e/fixtures/all_in_one/*`	Adds QB/LB handler implementations and minimal fixture project config.
`pytest.ini`	Registers `qb`, `lb`, `cold_start` markers.
`.github/workflows/CI-e2e.yml`	Replaces legacy Docker/mock-worker runner with flash-based e2e execution.
`.github/workflows/CI-e2e-nightly.yml`	Adds scheduled full-suite workflow including LB tests.
`docs/superpowers/specs/2026-03-13-flash-based-e2e-tests-design.md`	Design doc for the new flash-based e2e approach.
`docs/superpowers/plans/2026-03-13-flash-based-e2e-tests.md`	Implementation plan doc for the migration.
`CLAUDE.md`	Worktree/branch context notes for this change set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

tests/e2e/test_cold_start.py

CLAUDE.md

docs/superpowers/specs/2026-03-13-flash-based-e2e-tests-design.md

tests/e2e/test_endpoint_client.py

tests/e2e/test_cold_start.py

tests/e2e/conftest.py

tests/e2e/test_endpoint_client.py

tests/e2e/test_async_endpoint.py

tests/e2e/fixtures/all_in_one/lb_endpoint.py

The SDK's RunPodClient and AsyncEndpoint constructors check runpod.api_key at init time. The conftest patched endpoint_url_base but never set api_key, causing RuntimeError for all SDK client tests. Also add response body to state test assertions for debugging the 500 errors from stateful_handler remote dispatch.

The CI unit test workflow runs `uv run pytest` without path restrictions, which collects e2e tests that require flash CLI. Add tests/e2e to norecursedirs so only CI-e2e.yml runs these tests (with explicit markers).

Flash's @Remote dispatch provisions serverless endpoints on first request (~60s cold start). Without warmup, early tests fail with 500 because endpoints aren't ready. Run concurrent warmup requests in the flash_server fixture to provision all 3 QB endpoints before tests. Also add response body to assertion messages for better debugging.

- Remove test_async_run: flash dev server's /run endpoint doesn't return Runpod API format ({"id": "..."}) needed for async job polling - Remove test_run_async_poll: same /run incompatibility - Redesign state tests: remote dispatch means in-memory state can't persist across calls, so test individual set/get operations instead - Add explicit timeout=120 to SDK run_sync() calls to prevent 600s hangs - Reduce per-test timeout from 600s to 180s so hanging tests don't block the entire suite - Increase job timeout from 15 to 20 min to accommodate endpoint warmup

- Increase HTTP_CLIENT_TIMEOUT from 120 to 180s to match per-test timeout, preventing httpx.ReadTimeout for slow remote dispatch - Add AttributeError to expected exceptions in test_run_sync_error (SDK raises AttributeError when run_sync receives None input)

Drop 3.8 and 3.9 support, add 3.12. Flash requires 3.10+ and the SDK should target the same range.

…atch The stateful_handler uses multi-param kwargs (action, key, value) which flash's remote dispatch returns 500 for. The other handlers use a single dict param and work correctly. Remove the stateful handler fixture and tests — the remaining 7 tests provide solid coverage of handler execution, SDK client integration, cold start, and error propagation.

- Patch requests.Session.request instead of .get/.post in 401 tests (RunPodClient._request uses session.request, not get/post directly) - Fix test_missing_api_key to test Endpoint creation with None key (was calling run() on already-created endpoint with valid key) - Increase cold start benchmark threshold from 1000ms to 2000ms (CI runners with shared CPUs consistently exceed 1000ms)

- Submit all test jobs via asyncio.gather instead of serial parametrize - Exclude endpoint name from hardware_config_key so basic+delay share one endpoint (4 endpoints down to 3) - Reduce CI timeout from 20m to 15m

Previous teardown called resource_config.undeploy() via asyncio.get_event_loop().run_until_complete() which could fail silently due to event loop state during pytest session teardown. Switch to subprocess flash undeploy --all --force which runs in a clean process, reads .runpod/resources.pkl directly, and handles the full undeploy lifecycle reliably.

runpod-Henrik

1. CI workflow — approach is correct, a few edges

The elimination of the Docker build + JS test-runner in favor of direct Flash provisioning is clean. A few specifics:

Issue: flash undeploy --all --force in cleanup nukes every flash-managed endpoint in the account, not just the ones provisioned by this test run. If another deployment (e.g. a developer's personal endpoint or a parallel CI run) is active under the same API key, it gets torn down. The old approach cleaned up only the provisioned endpoints by ID. Recommend tracking provisioned endpoint names and scoping the undeploy call.

Question: CI job timeout is 15 minutes. The PR notes the test run took 686 seconds (11.4 min), leaving under 4 minutes of headroom. GPU provisioning time is environment-dependent. Is 15 minutes enough to avoid false failures when the cluster is busy?

Nit: uv pip install -e ".[test]" 2>/dev/null || uv pip install -e . — silencing stderr on the first attempt means a real failure (network error, dep conflict) is swallowed and falls through to the plain install. A --quiet flag would reduce noise without discarding error output.

2. Python version matrix change is undocumented

-        python-version: [3.8, 3.9, 3.10.15, 3.11.10]
+        python-version: ["3.10", "3.11", "3.12"]

The PR drops CI coverage for Python 3.8 and 3.9. setup.py still declares python_requires=">=3.8". If 3.8/3.9 aren't being dropped from the supported range, this is a gap that should be called out explicitly. If they are being dropped, setup.py needs updating.

3. Import threshold doubled — no explanation

-        results["measurements"]["runpod_total"]["mean"] < 1000
+        results["measurements"]["runpod_total"]["mean"] < 2000

The commit message says "CI runners have shared CPUs" but doubling the allowed import time is significant. If the import chain picked up a slow dependency this threshold would now mask it. Worth at least a comment explaining whether this is measured variance or an observed trend.

4. `verify_local_runpod` path check is fragile

if "runpod-python" not in runpod.__file__:

This checks that the string "runpod-python" appears in the installed module's file path. Works on CI (the Actions runner path contains runpod-python/runpod-python/). Fails for any developer who clones the repo into a differently-named directory (e.g. ~/src/rp-sdk/). The intent — verify this is not an installed wheel — would be more robustly expressed as checking importlib.metadata.packages_distributions() or just checking that runpod.__file__ resolves inside the project root.

5. `test_missing_api_key` validates a behavior change

-    runpod.api_key = None
-    self.endpoint.run(self.MODEL_INPUT)  # raised at call time
+    runpod.api_key = None
+    Endpoint(self.ENDPOINT_ID)           # raises at construction

The test change documents a semantic shift: API key validation moved from run-time to construction time. This is user-visible — code that creates an Endpoint at module import and provides a key later will now fail earlier and in a different place. Worth calling out in the PR description.

6. `hardware_config_key` only normalizes on two fields

normalized = {
    "gpuIds": hw.get("endpointConfig", {}).get("gpuIds", ""),
    "dockerArgs": hw.get("templateConfig", {}).get("dockerArgs", ""),
}

Tests with identical gpuIds + dockerArgs but differing templateConfig fields (env vars, image, scalerConfig) would share one endpoint provisioned with whichever config came first. For the current 4-test tests.json this is correct, but the function is fragile for future additions. The comment should say what fields are intentionally excluded and why.

7. `test_cold_start.py` hides diagnostics on failure

stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.DEVNULL,

When the cold-start test fails (the most common CI failure mode), there is no process output attached to the assertion. Consider capturing to a tempfile.NamedTemporaryFile and including the tail in the error message.

8. Module-level env mutation in `e2e_provisioner.py`

os.environ["FLASH_IS_LIVE_PROVISIONING"] = "false"

This is a module-level side effect — any code path that imports e2e_provisioner (even incidentally) mutates the global environment for the entire process. Since tests/e2e/ is in norecursedirs, this can't hit the regular test suite, but it's worth a comment explaining why the mutation must precede the import.

Nits

The two plan files in docs/superpowers/plans/ (1,052 + 740 lines) are fine as internal tooling docs, but one of them contained a reference to an absolute local path (/Users/deanquinanola/Github/...) in an earlier commit — check that's been cleaned up.
_build_docker_args concatenates cmd directly into a bash -c string without escaping. If any future dockerArgs value contains a double quote, the shell command will break. The current test values are safe.
CI runs on pull_request: [main] which means every PR triggers live GPU provisioning. Confirm this is intentional and within budget expectations for a project with external contributors.

Verdict

NEEDS WORK

Item 1 (flash undeploy --all) is a correctness/blast-radius issue. Items 2 and 3 (Python version drop, import threshold doubling) need explicit acknowledgment in the PR description or code comments. Item 4 (fragile path check) will break developer experience. Items 5–7 are correctness notes or moderate concerns. The overall architecture is sound.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Resolves code scanning alert: the except clause now uses continue with an explanatory comment instead of bare pass.

- verify_local_runpod: check runpod.__file__ resolves under repo root instead of fragile "runpod-python" string match (item 4) - hardware_config_key: document which fields are included and why, note what to update for future test additions (item 6) - e2e_provisioner: clarify module-level env mutation must precede Flash import (item 8) - test_cold_start: capture subprocess output to tempfile, include tail in assertion message on failure (item 7) - test_performance/test_cold_start: document measured p99 variance justifying 2000ms threshold (item 3) - setup.py: bump python_requires to >=3.10 to match CI matrix (item 2) - CI-e2e.yml: use --quiet instead of 2>/dev/null (nit)

deanq · 2026-03-23T21:54:20Z

Addressed in 84684af. Here's the breakdown:

Item 1 (flash undeploy --all blast radius): Keeping --all --force intentionally — this CI job runs with a dedicated API key scoped to e2e testing, so there are no unrelated endpoints to collide with. Scoping by name would add complexity without real benefit given the key isolation.

Item 2 (Python 3.8/3.9 drop): Updated setup.py python_requires from >=3.8 to >=3.10 to match the CI matrix. 3.8 and 3.9 are EOL.

Item 3 (import threshold 2000ms): Added comment documenting measured p99 variance (~1600ms on ubuntu-latest shared runners). A regression above 2000ms indicates a new heavy dependency, not runner noise.

Item 4 (verify_local_runpod): Replaced "runpod-python" in __file__ with Path.is_relative_to(repo_root). Works regardless of clone directory name.

Item 5 (test_missing_api_key behavior change): This documents existing behavior from an earlier commit in this repo (API key validation moved to Endpoint.__init__). Will note in PR description.

Item 6 (hardware_config_key): Added docstring explaining why only gpuIds and dockerArgs are keyed, and what to update if future tests vary other fields.

Item 7 (cold start diagnostics): Subprocess output now captured to a tempfile. On failure, the last 50 lines are included in the assertion message.

Item 8 (module-level env mutation): Updated comment to explain the mutation must precede the Flash import (Flash reads it at import time).

Nit (2>/dev/null): Changed to --quiet so real errors still surface.
Nit (absolute path): Already cleaned up in a prior commit.
Nit (_build_docker_args injection): Current test values are safe. Will add escaping if we open tests.json to external input.
Nit (CI on every PR): Intentional — the github.repository == 'runpod/runpod-python' guard prevents forks from triggering GPU provisioning.

setup.py was updated but pyproject.toml still had >=3.8, which caused uv sync to fail in CI with a resolver conflict. Also updated classifiers to drop 3.8/3.9 and add 3.12.

Dropping 3.8/3.9 is a breaking change for downstream users. Revert pyproject.toml, setup.py, and CI matrix to their original values. Python version narrowing should be a separate, intentional PR.

Copilot

Pull request overview

This PR refactors the end-to-end (E2E) CI approach by replacing the legacy Docker-build + JS action runner with a pytest-based workflow that provisions mock-worker endpoints via Flash, and adds a local Flash cold-start benchmark test.

Changes:

Replace the legacy E2E GitHub Actions workflow with a single-job pytest + Flash provisioning flow.
Add new pytest E2E suite (tests/e2e/*) driven by tests.json, including concurrent job submission and a cold-start check.
Update unit tests/mocks and adjust the cold-start performance threshold documentation.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/test_performance/test_cold_start.py`	Documents and increases the import-time threshold to reduce CI noise.
`tests/test_endpoint/test_runner.py`	Updates request mocking to patch `requests.Session.request` (matches current implementation).
`tests/e2e/tests.json`	Defines E2E scenarios and hardware/template configs consumed by pytest.
`tests/e2e/test_mock_worker.py`	Runs all E2E scenarios concurrently against provisioned endpoints and asserts outputs.
`tests/e2e/test_cold_start.py`	Adds a local `flash run` cold-start-under-threshold test.
`tests/e2e/fixtures/cold_start/pyproject.toml`	Fixture project metadata for the cold-start test.
`tests/e2e/fixtures/cold_start/handler.py`	Minimal Flash endpoint handler used by the cold-start test.
`tests/e2e/e2e_provisioner.py`	Implements grouping + provisioning logic (Flash Endpoint(image=...) + PodTemplate dockerArgs injection).
`tests/e2e/conftest.py`	Adds session fixtures for provisioning, SDK verification, and teardown cleanup.
`tests/e2e/__init__.py`	Introduces the E2E test package.
`pytest.ini`	Registers markers and excludes `tests/e2e` from default unit-test collection.
`.github/workflows/CI-e2e.yml`	Replaces the old two-job E2E workflow with pytest-driven Flash provisioning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/e2e/conftest.py

tests/e2e/e2e_provisioner.py

tests/e2e/test_mock_worker.py

.github/workflows/CI-e2e.yml

runpod-Henrik

Delta review — since 2026-03-23T13:33:14Z

1. Finding #1 — `flash undeploy --all --force` blast radius: Still open

conftest.py:55–66 still calls flash undeploy --all --force in session teardown. This deletes every Flash endpoint in the account, not only the ones created by this test run. A CI run with a shared API key will wipe any concurrently-running tests and any production-like endpoints belonging to that key. Since provision_endpoints already returns a seen dict with exactly the endpoints it created, teardown should iterate that dict and delete only those endpoints by name.

2. Finding #8 — module-level env mutation: Comment added, mutation remains

e2e_provisioner.py:21 still writes os.environ["FLASH_IS_LIVE_PROVISIONING"] = "false" at import time. The new comment explains the intent (Flash reads this at import time to pick between LiveServerless and ServerlessEndpoint). Given that tests/e2e is excluded from the main pytest norecursedirs and the CI runs pytest tests/e2e/ in a separate invocation, this is lower-risk than the prior review flagged — downgraded to a nit.

3. New issue: `test_mock_worker_jobs` ignores the session fixture and re-reads from disk

test_mock_worker.py:_load_test_cases() reads tests.json directly rather than using the test_cases session-scoped fixture that conftest.py provides. This means the test runs against a re-read snapshot of the file rather than what was actually passed to provision_endpoints. In practice they're the same file, but if someone parameterises the fixture in future, the test will silently diverge from what was provisioned. The test should accept test_cases as an argument.

4. New nit: `_build_docker_args` is fragile on CMD quoting

e2e_provisioner.py:39–46 wraps the docker args in /bin/bash -c "..." using Python string concatenation. If cmd (from base_docker_args) ever contains a double quote, the shell command will break. Current tests.json is safe, but this is one new entry away from a hard-to-debug CI failure. Consider using shlex.quote(cmd) or passing a proper list to docker args.

Prior findings status

#	Finding	Status
1	`flash undeploy --all --force` blast radius	Still open
2	Python 3.8/3.9 matrix drop not explained	Addressed — PR description now explicitly scopes this out
3	Import threshold doubled without justification	Addressed — p99 comment added
4	Fragile string `startswith` path check	Addressed — now uses `Path.is_relative_to()`
5	`api_key = None` after `Endpoint()` creation	Addressed — `None` set before constructor
6	`hardware_config_key` excluded fields unexplained	Addressed — comment added
7	Cold-start output hidden on failure	Addressed — log captured to tempfile, tail included in assertion
8	Module-level env mutation	Nit — comment added; risk low given isolated run

Verdict

NEEDS WORK — one open issue. Finding #1 (undeploy blast radius) is a real correctness problem in any environment with a shared API key. The rest is addressed or minor.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Adds workflow_dispatch action with two inputs: - dry_run (default true): list endpoints without deleting - name_filter: only target endpoints whose name contains the string Uses raw GraphQL (no SDK dependency) to list and delete endpoints via the same RUNPOD_API_KEY used by CI e2e tests. Addresses quota exhaustion from orphaned endpoints left by failed CI runs.

- Remove unused api_key param from _run_single_case and test fixture - Add unique run ID suffix to endpoint names to prevent collisions across parallel CI runs sharing the same API key - Bump astral-sh/setup-uv from v3 to v6 to match CI-pytests.yml

.github/workflows/cleanup-endpoints.yml

Replace flash undeploy --all --force with per-name undeploy for each endpoint provisioned by this test run. Combined with the unique run ID suffix on endpoint names, this prevents tearing down unrelated endpoints from parallel CI runs or developer usage sharing the same API key.

Merge main branch to resolve conflicts (PR #479 prerequisite now merged). Resolve 6 file conflicts by taking main's refined e2e test versions. Address unresolved CodeQL code scanning findings: - Replace global boolean guards with dict-based _registration_state to eliminate "unused global variable" false positives - Move inline comments before pass statements so CodeQL recognizes explanatory comments on empty except clauses All 464 tests pass, 94.26% coverage.

deanq added 15 commits March 13, 2026 17:33

chore: scaffold e2e test directory and fixture project

537489b

feat: add QB fixture handlers for e2e tests

853ce49

feat: add LB fixture handler for e2e tests

7b29446

chore: register e2e pytest markers (qb, lb, cold_start)

149d571

feat: add e2e conftest with flash_server lifecycle fixture

a750293

feat: add e2e tests for sync and async QB handlers

339ac28

feat: add e2e tests for stateful worker persistence

fd40904

feat: add e2e tests for SDK Endpoint client round-trip

aa543e9

feat: add e2e tests for async SDK Endpoint client

cc184c5

feat: add e2e cold start benchmark test

a80511e

feat: add e2e tests for LB remote dispatch

cfa6eb1

feat: replace CI-e2e.yml with flash-based QB e2e tests

c0fd809

feat: add nightly CI workflow for full e2e suite including LB

83a798f

deanq changed the title ~~Replace archaic e2e tests with flash-based infrastructure~~ refactor: replace legacy e2e tests with flash-based infrastructure Mar 14, 2026

github-advanced-security bot found potential problems Mar 14, 2026

View reviewed changes

tests/e2e/conftest.py Fixed Show fixed Hide fixed

deanq requested a review from Copilot March 14, 2026 01:42

deanq changed the title ~~refactor: replace legacy e2e tests with flash-based infrastructure~~ tests: replace legacy e2e tests with flash-based infrastructure Mar 14, 2026

Copilot started reviewing on behalf of deanq March 14, 2026 01:42 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

deanq added 8 commits March 13, 2026 18:50

fix(ci): exclude tests/e2e from default pytest collection

076099e

The CI unit test workflow runs `uv run pytest` without path restrictions, which collects e2e tests that require flash CLI. Add tests/e2e to norecursedirs so only CI-e2e.yml runs these tests (with explicit markers).

fix(ci): update unit test matrix to Python 3.10-3.12

76a2124

Drop 3.8 and 3.9 support, add 3.12. Flash requires 3.10+ and the SDK should target the same range.

deanq added 2 commits March 20, 2026 20:47

perf(e2e): run jobs concurrently and consolidate endpoints

35c372a

- Submit all test jobs via asyncio.gather instead of serial parametrize - Exclude endpoint name from hardware_config_key so basic+delay share one endpoint (4 endpoints down to 3) - Reduce CI timeout from 20m to 15m

This was referenced Mar 21, 2026

fix(config): handle corrupted config.toml in credential functions #481

Merged

feat(fitness): comprehensive system fitness checks for worker startup validation #472

Open

deanq requested review from KAJdev, jhcipar and justinwlin March 21, 2026 04:58

runpod-Henrik reviewed Mar 23, 2026

View reviewed changes

deanq force-pushed the deanq/e-3379-flash-based-e2e-tests branch 2 times, most recently from b4728ed to d25fabc Compare March 23, 2026 15:36

deanq added 2 commits March 23, 2026 14:49

fix(tests): replace empty except pass with continue in cold start poll

bf9734f

Resolves code scanning alert: the except clause now uses continue with an explanatory comment instead of bare pass.

deanq and others added 3 commits March 23, 2026 14:56

fix: update requires-python to >=3.10 in pyproject.toml

c64ae1e

setup.py was updated but pyproject.toml still had >=3.8, which caused uv sync to fail in CI with a resolver conflict. Also updated classifiers to drop 3.8/3.9 and add 3.12.

revert: restore Python >=3.8 support and original CI matrix

013640c

Dropping 3.8/3.9 is a breaking change for downstream users. Revert pyproject.toml, setup.py, and CI matrix to their original values. Python version narrowing should be a separate, intentional PR.

Merge branch 'main' into deanq/e-3379-flash-based-e2e-tests

7ac0829

deanq requested review from Copilot and runpod-Henrik March 23, 2026 22:10

Copilot started reviewing on behalf of deanq March 23, 2026 22:11 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

tests/e2e/conftest.py Outdated Show resolved Hide resolved

tests/e2e/e2e_provisioner.py Outdated Show resolved Hide resolved

tests/e2e/test_mock_worker.py Outdated Show resolved Hide resolved

.github/workflows/CI-e2e.yml Outdated Show resolved Hide resolved

runpod-Henrik reviewed Mar 23, 2026

View reviewed changes

deanq added 2 commits March 23, 2026 15:21

jhcipar reviewed Mar 23, 2026

View reviewed changes

.github/workflows/cleanup-endpoints.yml Show resolved Hide resolved

jhcipar approved these changes Mar 23, 2026

View reviewed changes

deanq requested a review from runpod-Henrik March 23, 2026 22:45

deanq merged commit 3fc3df1 into main Mar 23, 2026
7 checks passed

deanq deleted the deanq/e-3379-flash-based-e2e-tests branch March 23, 2026 23:12

Conversation

deanq commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this replaces

Test coverage

Notable side-effects

Not in scope

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. CI workflow — approach is correct, a few edges

2. Python version matrix change is undocumented

3. Import threshold doubled — no explanation

4. verify_local_runpod path check is fragile

5. test_missing_api_key validates a behavior change

6. hardware_config_key only normalizes on two fields

7. test_cold_start.py hides diagnostics on failure

8. Module-level env mutation in e2e_provisioner.py

Nits

Verdict

Uh oh!

deanq commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Finding #1 — flash undeploy --all --force blast radius: Still open

2. Finding #8 — module-level env mutation: Comment added, mutation remains

3. New issue: test_mock_worker_jobs ignores the session fixture and re-reads from disk

4. New nit: _build_docker_args is fragile on CMD quoting

Prior findings status

Verdict

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deanq commented Mar 14, 2026 •

edited

Loading

4. `verify_local_runpod` path check is fragile

5. `test_missing_api_key` validates a behavior change

6. `hardware_config_key` only normalizes on two fields

7. `test_cold_start.py` hides diagnostics on failure

8. Module-level env mutation in `e2e_provisioner.py`

1. Finding #1 — `flash undeploy --all --force` blast radius: Still open

3. New issue: `test_mock_worker_jobs` ignores the session fixture and re-reads from disk

4. New nit: `_build_docker_args` is fragile on CMD quoting