BGP unnumbered peers: make database representation consistent and more strict#10155
BGP unnumbered peers: make database representation consistent and more strict#10155jgallagher wants to merge 54 commits intomainfrom
Conversation
…ersion module where we changed over
internet-diglett
left a comment
There was a problem hiding this comment.
Apologies for the long cycle on this review. I took a first pass, going to go over again now that I have a good understanding of how the changes work.
| DELETE FROM omicron.public.switch_port_settings_bgp_peer_config | ||
| WHERE (host(addr) = '0.0.0.0' OR host(addr) = '::') | ||
| AND EXISTS ( | ||
| SELECT 1 | ||
| FROM omicron.public.switch_port_settings_bgp_peer_config AS other | ||
| WHERE other.port_settings_id IS NOT DISTINCT FROM | ||
| switch_port_settings_bgp_peer_config.port_settings_id | ||
| AND other.bgp_config_id = | ||
| switch_port_settings_bgp_peer_config.bgp_config_id | ||
| AND other.interface_name IS NOT DISTINCT FROM | ||
| switch_port_settings_bgp_peer_config.interface_name | ||
| AND other.addr IS NULL | ||
| ); | ||
|
|
||
| -- 2. In groups where both 0.0.0.0 and :: exist (with no NULL row, since those | ||
| -- groups were handled above), delete the :: row so that the 0.0.0.0 row can | ||
| -- be updated to NULL in step 3. | ||
| DELETE FROM omicron.public.switch_port_settings_bgp_peer_config | ||
| WHERE host(addr) = '::' | ||
| AND EXISTS ( | ||
| SELECT 1 | ||
| FROM omicron.public.switch_port_settings_bgp_peer_config AS other | ||
| WHERE other.port_settings_id IS NOT DISTINCT FROM | ||
| switch_port_settings_bgp_peer_config.port_settings_id | ||
| AND other.bgp_config_id = | ||
| switch_port_settings_bgp_peer_config.bgp_config_id | ||
| AND other.interface_name IS NOT DISTINCT FROM | ||
| switch_port_settings_bgp_peer_config.interface_name | ||
| AND host(other.addr) = '0.0.0.0' | ||
| ); | ||
|
|
||
| -- 3. Update all remaining sentinel rows to NULL. After the deletes above, | ||
| -- each group has at most one sentinel row. | ||
| UPDATE omicron.public.switch_port_settings_bgp_peer_config | ||
| SET addr = NULL | ||
| WHERE host(addr) IN ('0.0.0.0', '::'); |
There was a problem hiding this comment.
I had claude churning in the background while I reviewed this and I think it may have actually caught something based on this constraint:
omicron/schema/crdb/dbinit.sql
Lines 3756 to 3759 in 524b3a3
This unique index does not include the bgp_config_id.
Given the following db state:
- Row A: (port_settings_id=P, bgp_config_id=X, interface_name=I, addr=NULL)
- Row B: (port_settings_id=P, bgp_config_id=Y, interface_name=I, addr='0.0.0.0')
It appears the following would occur:
- Step 1 would not delete row B due to the different
bgp_config_id - Step 3 would update row B's
addrtoNULL, violating theUNIQUEconstraint and causing the migration to fail.
Was there a particular scenario you were accounting for by filtering on bgp_config_id as well?
There was a problem hiding this comment.
The following change to the test data triggers the constraint violation:
--- a/nexus/tests/integration_tests/data_migrations/bgp_unnumbered_peer_cleanup.rs
+++ b/nexus/tests/integration_tests/data_migrations/bgp_unnumbered_peer_cleanup.rs
@@ -45,6 +45,8 @@ const PORT_SETTINGS: Uuid =
Uuid::from_u128(0x25100001_0000_0000_0000_000000000001);
const BGP_CONFIG: Uuid =
Uuid::from_u128(0x25100001_0000_0000_0000_000000000002);
+const BGP_CONFIG_2: Uuid =
+ Uuid::from_u128(0x25100001_0000_0000_0000_000000000003);
// bgp_peer_config row IDs
const PEER_NUMBERED: Uuid =
@@ -73,6 +75,9 @@ const PEER_V4_V6_V4: Uuid =
Uuid::from_u128(0x25100002_0000_0000_0000_00000000000c);
const PEER_V4_V6_V6: Uuid =
Uuid::from_u128(0x25100002_0000_0000_0000_00000000000d);
+const PEER_BAD: Uuid =
+ Uuid::from_u128(0x25100002_0000_0000_0000_00000000000e);
+
fn before<'a>(ctx: &'a MigrationContext<'a>) -> BoxFuture<'a, ()> {
Box::pin(async move {
@@ -97,6 +102,9 @@ fn before<'a>(ctx: &'a MigrationContext<'a>) -> BoxFuture<'a, ()> {
-- Single ::
('{PORT_SETTINGS}', '{BGP_CONFIG}', 'single-v6',
'::', '{PEER_SINGLE_V6}', 0),
+ -- Trigger constraint after migration
+ ('{PORT_SETTINGS}', '{BGP_CONFIG_2}', 'single-v6',
+ NULL, '{PEER_BAD}', 0),
-- NULL + 0.0.0.0
('{PORT_SETTINGS}', '{BGP_CONFIG}', 'null-v4',
NULL, '{PEER_NULL_V4_NULL}', 100),cargo nextest run validate_data_migrations
info: experimental features enabled: setup-scripts, benchmarks
Compiling omicron-nexus v0.1.0 (/disk1/workspace/omicron/nexus)
Finished `test` profile [unoptimized + debuginfo] target(s) in 51.03s
Finished `test` profile [unoptimized + debuginfo] target(s) in 1.04s
Running `target/debug/crdb-seed`
Apr 21 19:10:34.326 INFO Using existing CRDB seed tarball: `/tmp/crdb-base-levon/de0e73ffd7e94bdffaa793ca63d2d73b4c8a651ecb51a3df374e1ade5458a0d0.tar`
SETUP PASS [ 1.225s] crdb-seed: cargo run -p crdb-seed --profile test
FAIL [ 47.037s] omicron-nexus::test_all integration_tests::schema::validate_data_migrations
stdout ───
running 1 test
Starting migration test: 148.0.0 → 251.0.0
test integration_tests::schema::validate_data_migrations ... FAILED
failures:
failures:
integration_tests::schema::validate_data_migrations
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 560 filtered out; finished in 47.00s
stderr ───
log file: /tmp/test_all-e37394e57f1cfdb6-validate_data_migrations.16552.0.log
note: configured to log to "/tmp/test_all-e37394e57f1cfdb6-validate_data_migrations.16552.0.log"
thread 'integration_tests::schema::validate_data_migrations' (2) panicked at nexus/tests/integration_tests/schema.rs:71:9:
Failed to execute update step up01.sql: db error: ERROR: duplicate key value violates unique constraint "switch_port_settings_bgp_peer_config_unnumbered_unique"
DETAIL: Key (port_settings_id,interface_name)=('25100001-0000-0000-0000-000000000001','single-v6') already exists.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
WARN: dropped CockroachInstance without cleaning it up first (there may still be a child process running and a temporary directory leaked)
WARN: temporary directory leaked: "/tmp/.tmp2oTV1F"
If you would like to access the database for debugging, run the following:
# Run the database
cargo xtask db-dev run --no-populate --store-dir "/tmp/.tmp2oTV1F/data"
# Access the database. Note the port may change if you run multiple databases.
cockroach sql --host=localhost:32221 --insecure
Cancelling due to test failure:
────────────
Summary [ 48.276s] 1 test run: 0 passed, 1 failed, 2809 skipped
FAIL [ 47.037s] omicron-nexus::test_all integration_tests::schema::validate_data_migrations
error: test run failed
There was a problem hiding this comment.
Row A: (port_settings_id=P, bgp_config_id=X, interface_name=I, addr=NULL) Row B: (port_settings_id=P, bgp_config_id=Y, interface_name=I, addr='0.0.0.0')
Is this a legal state? I.e., you could have the same interface used in the same port_settings_id used in two different bgp_config_ids?
I think when I wrote this I had the parent/child relationship backwards. I was thinking:
bgp_config_idis the parentport_settings_idis a child of the BGP config (it identifies a port!)interface_nameis a child of the port
but now that you ask this I have this backwards, don't I? It's actually
port_settings_idis the parent ("all the port settings", not "the settings of a port"?)bgp_config_idis a child of the port settingsinterface_nameis a child of the BGP config
I can revise this, although I'm still curious if this is a legal state.
There was a problem hiding this comment.
Actually, those are great questions.
The port settings object is indeed the "parent" object. If the peer is not assigned to a port settings record, it is not active and in theory should not be applied to maghemite via the network configuration background task.
My understanding of the model is that the bgp configuration is a shared configuration object that multiple bgp peer configurations can reference, and we can have different peers referencing different bgp configurations. Peer configurations are applied to specific interfaces.
That said, even though the protocol supports multiple peers over a single interface, I don't think that is something we support today, especially for unnumbered peers. I don't know if we prevent any such configuration via our APIs.
@taspelund any thoughts?
There was a problem hiding this comment.
I'll admit I don't have a good understanding of the nexus model for BGP configuration or how that all maps back into the maghemite model. From a maghemite perspective, the only association that exists between a BGP neighbor and an interface is when the neighbor is unnumbered. The only time the interface information is used is during NDP neighbor discovery of the unnumbered peer, and when binding an outbound socket to that interface when attempting to make an outbound connection to the unnumbered peer. Numbered peers do not bind their sockets to an interface, so it's up to the OS to resolve the correct interface to use for an outbound connection (generally relying on the uplinks to have been configured with the correct IP/mask).
So from a maghemite perspective, there's no difference between (interface=X, peer-ip=A) and (interface=Y, peer-ip=B).
I'm not sure how the "bgp configuration" object maps back to a Router (in maghemite), but I would imagine them to be pretty similar in contents, e.g. local ASN, some RIB settings like fanout, etc.
However, I imagine there will be some changes in this area when we move forward with RFD 662, so I wouldn't shy away from making larger changes if they make sense.
Staged on top of #10122, which strengthened the types used for BGP peers in the external API.
This PR is pretty big, but the actual prod code changes are more like +600/-300 (there are a bunch of new tests and a bunch of SQL migration Stuff pushing the total size up). I'd recommend starting with the changes to
dbinit.sql, which should be in line with the plan @internet-diglett and I discussed. Summarizing #9832 (comment):NULLaddressCHECKconstraints that prevent storing the addresses0.0.0.0or::to ensure we don't have any leftover sentinel valuesidprimary key (a random UUID)The database migrations are similar in spirit to #9976: while we're not correcting mismatches between diesel and CRDB here, we are introducing migrations that could fail if real databases have data we don't expect (e.g., an address of
0.0.0.0in the table where we expect unnumbered peers to be represented as NULL). The tactic I went with in the migration is:NULL.NULLthen0.0.0.0) and delete the rest. This shouldn't delete anything unless someone has some already-invalid-at-runtime data.There's a data migration test that should cover a sample of each kind of possibly-invalid input and confirm the above behavior.
One other thing that snuck in here is a bugfix for #10151; that's in 70ae97e.