DAOS-17306 doc: self-healing properties, interactive rebuild#18023
DAOS-17306 doc: self-healing properties, interactive rebuild#18023
Conversation
b137cd4 to
185c293
Compare
|
Ticket title is 'Enable/disable auto recovery' |
For the DAOS version 2.8 release, add two major sections to the DAOS Administrator's Guide: - self-healing properties / policy controls (DAOS-17306) - explicit / interactive rebuild control (DAOS-17281) Doc-only: true Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com> Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
output (introduced in PR #17371 / DAOS 2.6). The new section explains: - Field values (normal vs degraded) - When to check this field - Example usage with exclude-only self-heal policies - How to verify exclusion completed when auto-rebuild is disabled Updated pool query examples throughout to show the Data redundancy field for consistency with DAOS 2.6+ output. Particularly useful for scenarios where system.self_heal is set to exclude,pool_exclude or pool self_heal has exclude bit set without rebuild, to confirm exclusion has occurred. Related-to: #17371 Doc-only: true Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
4cc9216 to
caf846e
Compare
|
had to force push to clean up the commit list, Hope that's okay |
|
|
||
| | Value | Meaning | | ||
| |-------|---------| | ||
| | `normal` | All targets are UP and accessible; data redundancy is intact | |
There was a problem hiding this comment.
| | `normal` | All targets are UP and accessible; data redundancy is intact | | |
| | `normal` | No targets are DOWN; data redundancy is intact | |
Reasoning is for targets that are being reintegrated, their state is UP (but not yet UP_IN). The data redundancy would likely be intact in this case, if a previous exclude rebuild finished successfully.
| - Data redundancy: normal | ||
| ``` | ||
|
|
||
| or when targets are excluded: |
There was a problem hiding this comment.
| or when targets are excluded: | |
| or when targets are excluded and a corresponding rebuild has not yet completed: |
| - Rebuild busy, 42 objs, 21 recs | ||
| - Data redundancy: degraded | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Something I thought of, but no change requested
there are other cases like drain that would show Rebuild busy but Data redundancy: normal
And I guess extend might show something similar. But it's probably not worth enumerating too many cases actually.
|
|
||
| The combination of: | ||
| - `Disabled ranks: 3` — Shows which rank was excluded | ||
| - `Rebuild idle` — No automatic rebuild triggered (expected with exclude-only policy) |
There was a problem hiding this comment.
| - `Rebuild idle` — No automatic rebuild triggered (expected with exclude-only policy) | |
| - `Rebuild idle` or `Rebuild done`— idle if pool has not been rebuilt since system start, done if it has been rebuild for a prior change. No new automatic rebuild triggered for this exclusion (expected with exclude-only policy) |
| - `Data redundancy: degraded` — Confirms data redundancy is impaired and manual | ||
| intervention is needed | ||
|
|
||
| To restore redundancy, manually trigger rebuild: |
There was a problem hiding this comment.
| To restore redundancy, manually trigger rebuild: | |
| To restore redundancy for the excluded targets, manually trigger rebuild. Refer to [Detailed Pool Operations: Interactive Rebuild Controls](rebuild_controls.md) for more information about rebuild start commands. |
For the DAOS version 2.8 release, add two major sections to the DAOS Administrator's Guide:
Doc-only: true
Signed-off-by: Kenneth Cain kenneth.cain@hpe.com
Signed-off-by: Tom Nabarro thomas.nabarro@hpe.com
Steps for the author:
After all prior steps are complete: