Skip to content

Add support for re-adopting physical disks#10221

Open
andrewjstone wants to merge 16 commits intomainfrom
manual-disk-adoption
Open

Add support for re-adopting physical disks#10221
andrewjstone wants to merge 16 commits intomainfrom
manual-disk-adoption

Conversation

@andrewjstone
Copy link
Copy Markdown
Contributor

@andrewjstone andrewjstone commented Apr 4, 2026

This change implements the determinations in RFD 663. It allows
re-adopting physical disks in the control plane after the control plane
level disk in the physical_disk table is expunged.

It does this by forcing manual adoption of disks by an operator, where
requests are placed in the physical_disk_adoption_request table.
A disk will now only be adopted or re-adopted by the disk adoption
background task if its physical vendor/model/serial information is
present in a physical_disk_adoption_request row.

The typical flow for an operator is to list uninitialized disks and then
issue an adoption request via the external API.

Comment thread nexus/src/app/sled.rs Outdated
@andrewjstone andrewjstone force-pushed the manual-disk-adoption branch from 67bdc40 to ce4b577 Compare April 7, 2026 20:35
@andrewjstone andrewjstone changed the title WIP: Manual disk adoption Add support for re-adopting physical disks Apr 7, 2026
This change implements the determinations in RFD 693. It allows
re-adopting physical disks in the control plane after the control plane
level disk in the `physical_disk` table is expunged.

It does this by forcing manual adoption of disks by an operator, where
requests are placed in the `physical_disk_adoption_request` table.
A disk will now only be adopted or re-adopted by the disk adoption
background task if its physical vendor/model/serial information is
present in a `physical_disk_adoption_request` row.

The typical flow for an operator is to list unitialized disks and then
issue an adoption request via the external API.
@andrewjstone andrewjstone force-pushed the manual-disk-adoption branch from ce4b577 to edc851b Compare April 7, 2026 20:42
@andrewjstone andrewjstone marked this pull request as ready for review April 7, 2026 20:42
Comment thread nexus/src/external_api/http_entrypoints.rs Outdated
@smklein
Copy link
Copy Markdown
Collaborator

smklein commented Apr 7, 2026

This change implements the determinations in RFD 693. It allows re-adopting physical disks in the control plane after the control plane level disk in the physical_disk table is expunged.

Nit: 663

@andrewjstone
Copy link
Copy Markdown
Contributor Author

This change implements the determinations in RFD 693. It allows re-adopting physical disks in the control plane after the control plane level disk in the physical_disk table is expunged.

Nit: 663

Whoops. I even have it open in a tab. Thanks!

Copy link
Copy Markdown
Contributor

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my understanding, we have a couple of metaphors in here. Disks that are unknown to the control plane are "uninitialized" and appear in that list. The verb we're using is "adopt", as in "this uninitialized disk is adopted by the control plane". Is the term "adopt" intended to be the opposite of "expunge"?

I'm not clear on the how the operator would use this. Presumably there's supposed to be some step (e.g. of exploration or cognition) between "list uninit disks" and "approve disk(s)", but I'm not sure what it is.

We don’t want to allow automatic disk adoption due to the risk of the insertion of malicious hardware during casual physical access. This is especially problematic before we have disk attestation support, and in the case of existing sleds with empty disk bays.

Presumably I want to make sure the hardware I just put into the U.2 bay is the same as what I'm about to adopt. How do I do that?

The API changes look fine; I'd ask you think about nomenclature ("uninitialized" "adopt").

Comment thread nexus/external-api/src/lib.rs Outdated
query: Query<PaginationParams<EmptyScanParams, String>>,
) -> Result<
HttpResponseOk<
ResultsPage<latest::physical_disk::UninitializedPhysicalDisk>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I expunge a disk, I think that changes both the policy and state properties (I'm not sure why there are both of these--is one intent and the other is status?), does the disk immediately show up in this list?

Do disks show up in only one place or the other? Do some show up in both?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expunge happens immediately, and is a terminating enum variant. It's the intention of the desired state. Once it occurs we can assume it will never change and most things just look at expunge. Decommissioned state occurs after other steps, when the intended state is realized. So there is some delay there.

Comment thread nexus/external-api/src/lib.rs Outdated
@hawkw
Copy link
Copy Markdown
Member

hawkw commented Apr 7, 2026

We don’t want to allow automatic disk adoption due to the risk of the insertion of malicious hardware during casual physical access. This is especially problematic before we have disk attestation support, and in the case of existing sleds with empty disk bays.

Presumably I want to make sure the hardware I just put into the U.2 bay is the same as what I'm about to adopt. How do I do that?

Is the idea that one would do this by comparing the manufacturer/model number/serial listed by the API endpoint with those physically printed on the actual disk, and also based on foreknowledge that disks have or have not been inserted in specific locations at the current point in time?

Comment on lines +300 to +304
pub async fn physical_disk_adoptable_list(
&self,
opctx: &OpContext,
inventory_collection_id: CollectionUuid,
) -> ListResultVec<InvPhysicalDisk> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about suggesting that this be paginated, but...do we expect the maximum number of rows to be limited by the physical fact that 32 sleds * 10 U.2s - 320 disks maximum?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not really a good way to paginate this right now AFAIK.

Comment thread nexus/db-queries/src/db/datastore/physical_disk.rs
/// A physical disk that has not yet been adopted by the control plane
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, JsonSchema)]
pub struct UninitializedPhysicalDisk {
pub sled_id: SledUuid,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect the client to hydrate this sled UUID into a sled's physical location? it seems like it would be desirable for a UI listing physical disks that need to be adopted to be able to say which sled they are in as well as the slot within that sled...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I suppose we could also provide the sled cubby. That would help operators out a bit probably.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed we have this, right? Uninitalized physical disks come from sled-agent inventory, and sled-agent doesn't know its own cubby number. We could try to match up the sled serial against the SP inventory contents to identify a cubby, and that will usually work, but we'd still need to be able to represent "physical disk for sled X for cubby I Dunno Ask Again Later".

@andrewjstone
Copy link
Copy Markdown
Contributor Author

We don’t want to allow automatic disk adoption due to the risk of the insertion of malicious hardware during casual physical access. This is especially problematic before we have disk attestation support, and in the case of existing sleds with empty disk bays.

Presumably I want to make sure the hardware I just put into the U.2 bay is the same as what I'm about to adopt. How do I do that?

Is the idea that one would do this by comparing the manufacturer/model number/serial listed by the API endpoint with those physically printed on the actual disk, and also based on foreknowledge that disks have or have not been inserted in specific locations at the current point in time?

Yes, basically, if they actually cared. I think the larger security issue mitigated is that any disk inserted can only be activated by an operator and not just adopted automatically for usage. Checking the serial is how they would know which disk they were validating against.

),
),
)
// Ensure that each inventory disk has a valid adoption request
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a precondition that "you cannot have an adoption request for an already in-service disk"?

(Having already-in-service disks show up here seems wrong, just confirming what prevents that. I think the answer is "yes, a non-deleted adoption request means you have no live disks here")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a precondition.

Comment thread nexus/db-queries/src/db/datastore/physical_disk.rs Outdated
Comment thread nexus/types/versions/src/manual_disk_adoption/physical_disk.rs Outdated
Comment thread nexus/src/external_api/http_entrypoints.rs Outdated
Comment thread nexus/src/app/background/tasks/physical_disk_adoption.rs Outdated
Comment thread nexus/db-queries/src/db/datastore/physical_disk.rs
@andrewjstone
Copy link
Copy Markdown
Contributor Author

To the best of my understanding, we have a couple of metaphors in here. Disks that are unknown to the control plane are "uninitialized" and appear in that list. The verb we're using is "adopt", as in "this uninitialized disk is adopted by the control plane". Is the term "adopt" intended to be the opposite of "expunge"?

It's not necessarily the opposite of expunge. A user can insert a disk in an empty bay and it would be adopted for use. It's idle / uninitialized before that.

I'm not clear on the how the operator would use this. Presumably there's supposed to be some step (e.g. of exploration or cognition) between "list uninit disks" and "approve disk(s)", but I'm not sure what it is.

We don’t want to allow automatic disk adoption due to the risk of the insertion of malicious hardware during casual physical access. This is especially problematic before we have disk attestation support, and in the case of existing sleds with empty disk bays.

Presumably I want to make sure the hardware I just put into the U.2 bay is the same as what I'm about to adopt. How do I do that?

As Eliza noted, an operator would have to look at the vendor/model/serial on the drive and compare it to what shows up in the API.

The API changes look fine; I'd ask you think about nomenclature ("uninitialized" "adopt").

I'm open to changing this. adoption has been our internal de-facto term for a while. WE could say 'initializeoractivate` or something else.

Comment thread nexus/db-model/src/physical_disk.rs
Comment thread nexus/db-model/src/physical_disk.rs Outdated
/// A physical disk that has not yet been adopted by the control plane
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, JsonSchema)]
pub struct UninitializedPhysicalDisk {
pub sled_id: SledUuid,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed we have this, right? Uninitalized physical disks come from sled-agent inventory, and sled-agent doesn't know its own cubby number. We could try to match up the sled serial against the SP inventory contents to identify a cubby, and that will usually work, but we'd still need to be able to represent "physical disk for sled X for cubby I Dunno Ask Again Later".

Comment thread nexus/types/versions/src/manual_disk_adoption/physical_disk.rs Outdated
Comment thread nexus/src/app/sled.rs Outdated
Comment thread nexus/types/versions/src/manual_disk_adoption/physical_disk.rs Outdated
Comment thread nexus/db-queries/src/db/datastore/physical_disk.rs
Comment thread nexus/src/external_api/http_entrypoints.rs Outdated
Comment thread schema/crdb/dbinit.sql Outdated
Comment thread schema/crdb/dbinit.sql Outdated
@hawkw
Copy link
Copy Markdown
Member

hawkw commented Apr 8, 2026

I'm not clear on the how the operator would use this. Presumably there's supposed to be some step (e.g. of exploration or cognition) between "list uninit disks" and "approve disk(s)", but I'm not sure what it is.

We don’t want to allow automatic disk adoption due to the risk of the insertion of malicious hardware during casual physical access. This is especially problematic before we have disk attestation support, and in the case of existing sleds with empty disk bays.

Presumably I want to make sure the hardware I just put into the U.2 bay is the same as what I'm about to adopt. How do I do that?

As Eliza noted, an operator would have to look at the vendor/model/serial on the drive and compare it to what shows up in the API.

@smklein and I have been talking through disk replacement scenarios a bit from the fault management context, and I think the adoption requests will eventually be part of the service flow for disk replacements. I think in particular, we would really like it if the adoption requests could easily include the physical location of the disk as part of a "you just replaced the disk in sled 19 slot 3, okay yeah it's that one" kinda spot check.

@andrewjstone
Copy link
Copy Markdown
Contributor Author

To the best of my understanding, we have a couple of metaphors in here. Disks that are unknown to the control plane are "uninitialized" and appear in that list. The verb we're using is "adopt", as in "this uninitialized disk is adopted by the control plane". Is the term "adopt" intended to be the opposite of "expunge"?

@ahl @smklein @hawkw

I just talked about this with John a bit and if you don't like the term adopt, we could always use the word import. Then instead of uninitialized we would say unimported. We could also not mix metaphors by saying adopt and unadopted.

@ahl
Copy link
Copy Markdown
Contributor

ahl commented Apr 8, 2026

I'm good with whatever of those you choose. I appreciate you discussing.

@andrewjstone
Copy link
Copy Markdown
Contributor Author

andrewjstone commented Apr 8, 2026

@smklein and I have been talking through disk replacement scenarios a bit from the fault management context, and I think the adoption requests will eventually be part of the service flow for disk replacements. I think in particular, we would really like it if the adoption requests could easily include the physical location of the disk as part of a "you just replaced the disk in sled 19 slot 3, okay yeah it's that one" kinda spot check.

As @jgallagher pointed out earlier today, we don't actually have this information in sled-agent inventory. What happens if the sp inventory doesn't exist for this collection? Then we can't list the sled as uninitialized or we always need to carry around an option here. Would it make sense to tell a customer: "Hey, actually I don't know what cubby this disk is in right now." or "This sled is not uninitialized" event though the customer just inserted it into the rack and expects to see it?

@andrewjstone
Copy link
Copy Markdown
Contributor Author

One thing I just realized was that with the current code, we will no longer automatically adopt disks when sleds are added to a rack. I confirmed with @rmustacc that this was not what he intended. Unfortunately, I'm not immediately sure how to fit this behavior in with the current one. Disks are detected asynchronously after a sled is added and currently adopted by the background task automatically. The new behavior ensures that a disk adoption request is made by a user to allow adoption by the background task, but the adoption itself is still done asynchronously in the background task and is separate from the sled add request.

What we would want is a state attached to the sled that says automatically adopt disks that were present when the sled was added to the rack. But we don't really have any mechanism to discover that information. I think the best we can do with the current code base is say something like "automatically adopt disks for this sled for 5 minutes" after it has been created. Given inventory delays this is also problematic as a disk the customer expected to be adopted that was in the sled when it was added may not actually get adopted. That's a terrible user experience. We could lessen the likelihood of non-adoption by increasing this window, but that now lengthens the time when casual physical attack is possible by inserting arbitrary disks.

The only other thing I can think of doing is adding disk information to the SledAgentInfo that gets published to the nexus internal api when the sled-agent starts up. But this is a client-versioned API currently, which could be problematic if a sled needs to be added during an upgrade.

@hawkw
Copy link
Copy Markdown
Member

hawkw commented Apr 9, 2026

I just talked about this with John a bit and if you don't like the term adopt, we could always use the word import. Then instead of uninitialized we would say unimported. We could also not mix metaphors by saying adopt and unadopted.

NOT TO BE ANNOYING BUT: I really don't like "import" in this context, it feels like it is too easily misconstrued as "import the data that was on this disk", which is not what one would expect to be offered but which is a somewhat conceivable thing that might occur.

@andrewjstone
Copy link
Copy Markdown
Contributor Author

I just talked about this with John a bit and if you don't like the term adopt, we could always use the word import. Then instead of uninitialized we would say unimported. We could also not mix metaphors by saying adopt and unadopted.

NOT TO BE ANNOYING BUT: I really don't like "import" in this context, it feels like it is too easily misconstrued as "import the data that was on this disk", which is not what one would expect to be offered but which is a somewhat conceivable thing that might occur.

NOT ANNOYING AT ALL!!! I appreciate the feedback.

I'm not a huge fan of import either. I still like adopt. FWIW, claude does too, but it's a sycophant that hallucinated usage in both zfs and kubernetes, so do with that what you will.

I think to make the metaphors less mixed I'd switch to listing unadopted rather than uninitialized disks.

@AlejandroME AlejandroME added this to the 20 milestone Apr 10, 2026
@andrewjstone
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews @smklein @hawkw @jgallagher

I believe I've addressed everything. Unfortunately, I discovered an issue that should probably be resolved before merge or explicitly decided not to implement: #10221 (comment)

@andrewjstone
Copy link
Copy Markdown
Contributor Author

One thing I just realized was that with the current code, we will no longer automatically adopt disks when sleds are added to a rack. I confirmed with @rmustacc that this was not what he intended. Unfortunately, I'm not immediately sure how to fit this behavior in with the current one. Disks are detected asynchronously after a sled is added and currently adopted by the background task automatically. The new behavior ensures that a disk adoption request is made by a user to allow adoption by the background task, but the adoption itself is still done asynchronously in the background task and is separate from the sled add request.

What we would want is a state attached to the sled that says automatically adopt disks that were present when the sled was added to the rack. But we don't really have any mechanism to discover that information. I think the best we can do with the current code base is say something like "automatically adopt disks for this sled for 5 minutes" after it has been created. Given inventory delays this is also problematic as a disk the customer expected to be adopted that was in the sled when it was added may not actually get adopted. That's a terrible user experience. We could lessen the likelihood of non-adoption by increasing this window, but that now lengthens the time when casual physical attack is possible by inserting arbitrary disks.

The only other thing I can think of doing is adding disk information to the SledAgentInfo that gets published to the nexus internal api when the sled-agent starts up. But this is a client-versioned API currently, which could be problematic if a sled needs to be added during an upgrade.

There are further problems that make the implementation next to impossible to do in an ideal manner.

  1. Before the sled-agent is up, it is not on the underlay network and so we can't ask it for the disks that are currently inserted.
  2. Even if we include those disks in the client-side versioned put to nexus from sled-agent, the disks themselves are loaded asynchronously by the hardware manager. It's possible that some of them haven't made themselves known yet.

We really seem to be restricted to a time based setup, or forcing manual disk adoption at all times.

@smklein
Copy link
Copy Markdown
Collaborator

smklein commented Apr 10, 2026

@andrewjstone and I chatted about this a bit offline. Recording some of our thoughts here:

  • In the short-term, it may make sense to keep the old behavior of "auto-adopt disks that haven't been part of the control plane before". We can make that old pathway create adoption requests, to unify the disk setup process. This still would allow an operator to re-add an expunged disk. The "auto-adoption" conditions could also be turned into a toggle, or turned off, at some point in the future.
  • We could read disk information from uninitalized sleds over the bootstrap network, and present that information as a part of "sled add" - basically, "sled add" could become "sled add AND create these disk adoption requests". We suspect this would not be a small task, but it's theoretically possible?

@andrewjstone
Copy link
Copy Markdown
Contributor Author

@andrewjstone and I chatted about this a bit offline. Recording some of our thoughts here:

* In the short-term, it may make sense to keep the old behavior of "auto-adopt disks that haven't been part of the control plane before". We can make that old pathway create adoption requests, to unify the disk setup process. This still would allow an operator to re-add an expunged disk. The "auto-adoption" conditions could also be turned into a toggle, or turned off, at some point in the future.

* We could read disk information from uninitalized sleds over the bootstrap network, and present that information as a part of "sled add" - basically, "sled add" could become "sled add AND create these disk adoption requests". We suspect this would not be a small task, but it's theoretically possible?

Based on discussion in update huddle last week, we decided that to move forward we would auto-adopt disks that haven't been part of the control plane before. c9618fc makes this change. Importantly it makes this change by inserting new disks into the physical_disk_adoption_request table and not changing the method that determines which disks are adoptable. This makes it easier to remove in the future. Thanks to @smklein for the suggestion.

@andrewjstone
Copy link
Copy Markdown
Contributor Author

This is ready for a re-review @jgallagher @smklein. I still need to test it on hardware which I'll do after feedback. I'd like to test either Thursday afternoon or Friday. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants