Skip to content

Eliminate get_committed_snapshots from join path.#7832

Merged
cjen1-msft merged 25 commits intomicrosoft:mainfrom
cjen1-msft:join-with-fetch
May 7, 2026
Merged

Eliminate get_committed_snapshots from join path.#7832
cjen1-msft merged 25 commits intomicrosoft:mainfrom
cjen1-msft:join-with-fetch

Conversation

@cjen1-msft
Copy link
Copy Markdown
Contributor

@cjen1-msft cjen1-msft commented Apr 21, 2026

Currently most of our code uses get_committed_snapshots implicitly via the _add_node or join_node calls.
This PR has two changes:

  • All startup and recovery runs now uses fetch_recent_snapshot (except tests which manually want that)
  • All join_node calls either have 'from_snapshot=False' or 'snapshot_dirs=..., from_snapshot=True', and asserts that no tests have just from_snapshot=True without 'snapshot_dirs=...'

This allows us to fully remove get_committed_snapshots from the startup path, pushing that out to the caller via snapshots_dir, this then paves the way for turning get_committed_snapshots into 'copy_snapshots_to_node_dir', bringing our testing closer to ACL's runtime.

I've left from_snapshot in in this PR to make it reviewable.
I'll submit another PR after this to remove it everywhere and instead just use snapshots_dir.

@cjen1-msft cjen1-msft added the run-long-test Run Long Test job label Apr 21, 2026
@cjen1-msft cjen1-msft marked this pull request as ready for review April 22, 2026 15:24
@cjen1-msft cjen1-msft requested a review from a team as a code owner April 22, 2026 15:24
Copilot AI review requested due to automatic review settings April 22, 2026 15:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Python test infrastructure and multiple tests to stop implicitly calling get_committed_snapshots() as part of the default node join flow, instead making snapshot usage explicit via from_snapshot, snapshots_dir, and fetch_recent_snapshot parameters.

Changes:

  • Update many test call sites to pass from_snapshot=False explicitly when joining nodes, and pass snapshots_dir when joining from a copied snapshot.
  • Refactor some tests (notably tests/reconfiguration.py) to separate “copy snapshot locally” vs “fetch snapshot on startup” behaviors.
  • Adjust tests/infra/network.py snapshot-join logic and make get_committed_snapshots() accept node=None (defaulting to the primary).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/infra/network.py Changes snapshot-join handling and get_committed_snapshots() defaulting behavior.
tests/reconfiguration.py Makes snapshot usage explicit and refactors test_add_node parameters.
tests/recovery.py Updates join calls to pass from_snapshot explicitly (and snapshots_dir where needed).
tests/lts_compatibility.py Branches join behavior by version, using copied snapshots for older versions.
tests/e2e_operations.py Updates several join paths and snapshot-related tests to be explicit about snapshot usage.
tests/redirects.py Updates joins to pass from_snapshot=False.
tests/partitions_test.py Updates join to be explicit about snapshot fetching behavior.
tests/limits.py Updates join to pass from_snapshot=False.
tests/governance.py Updates joins to pass from_snapshot=False.
tests/code_update.py Updates multiple joins to pass from_snapshot=False.
tests/e2e_logging.py Updates join to pass from_snapshot=False.
tests/e2e_common_endpoints.py Updates joins to pass from_snapshot=False.
tests/js-custom-authorization/custom_authorization.py Updates join to pass from_snapshot=False.
tests/infra/basicperf.py Ensures snapshot-join setup explicitly sets from_snapshot=True.

Custom instructions used:

  • .github/copilot-instructions.md
  • .github/instructions/reviewing.instructions.md

Comment thread tests/infra/network.py Outdated
Comment thread tests/infra/network.py Outdated
@achamayou
Copy link
Copy Markdown
Member

@cjen1-msft it feels like turning from_snapshot into an enum with No|FromFile|FromNode would be best for clarity. Apart from one residual test to confirm FromFile still works, and one with No to confirm retransmission also works, I would expect every other test to use the default FromNode.

@cjen1-msft
Copy link
Copy Markdown
Contributor Author

@achamayou I don't quite agree. If we propagated that all the way to the config, then yes.
But I think keeping our infra somewhat matching the underlying config is useful.
What I now want to do is to get rid of from_snapshot, and use just the snapshot_dir and fetch_recent_snapshot to express all of the options, as that matches directly what we have in the config => less magic.

Comment thread tests/infra/basicperf.py Outdated
Comment thread tests/infra/network.py
Comment thread tests/e2e_operations.py Outdated
Comment thread tests/lts_compatibility.py Outdated
@cjen1-msft cjen1-msft marked this pull request as draft May 1, 2026 09:16
@cjen1-msft cjen1-msft removed the run-long-test Run Long Test job label May 6, 2026
@cjen1-msft cjen1-msft marked this pull request as ready for review May 6, 2026 16:22
@cjen1-msft cjen1-msft merged commit a77c289 into microsoft:main May 7, 2026
19 checks passed
@cjen1-msft cjen1-msft deleted the join-with-fetch branch May 7, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants