Skip to content

concepts: add Cluster Layout page (#953)#955

Draft
jklare wants to merge 2 commits intomainfrom
jk-cluster-layout
Draft

concepts: add Cluster Layout page (#953)#955
jklare wants to merge 2 commits intomainfrom
jk-cluster-layout

Conversation

@jklare
Copy link
Copy Markdown
Contributor

@jklare jklare commented Apr 1, 2026

Adds docs/concepts/cluster-layout.md covering:

  • Recommended node counts per type (Manager, Control, Network, Storage, Compute)
  • Control node quorum requirements (MariaDB Galera, RabbitMQ, etcd)
  • Ceph replication factor implications for 3-node vs 5-node storage clusters
  • ceph-ansible variables (openstack_pool_default_size, cephfs_pool_default_size, rgw_pool_default_size) to adjust pool replication in cfg-cookiecutter repos when reducing the number of storage nodes
  • Network node redundancy considerations

AI-assisted: written with Claude Code (claude-sonnet-4-6)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

⚠️MegaLinter analysis: Success with warnings

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ ACTION actionlint 3 0 0 0.05s
✅ JSON jsonlint 4 0 0 0.15s
✅ JSON prettier 4 0 0 0.55s
✅ JSON v8r 4 0 0 6.99s
⚠️ MARKDOWN markdownlint 155 1 0 2.49s
⚠️ MARKDOWN markdown-table-formatter 155 1 0 0.45s
✅ REPOSITORY checkov yes no no 17.4s
✅ REPOSITORY git_diff yes no no 0.04s
✅ REPOSITORY secretlint yes no no 1.73s
✅ REPOSITORY trufflehog yes no no 4.15s
✅ SPELL codespell 163 0 0 0.54s
✅ SPELL lychee 163 0 0 10.08s
✅ YAML prettier 4 0 0 0.38s
✅ YAML v8r 4 0 0 5.8s
✅ YAML yamllint 4 0 0 0.43s

Detailed Issues

⚠️ MARKDOWN / markdown-table-formatter - 1 error
1 files contain markdown tables to format:
- docs/concepts/cluster-layout.md
⚠️ MARKDOWN / markdownlint - 1 error
docs/concepts/cluster-layout.md:29 error MD001/heading-increment Heading levels should only increment by one level at a time [Expected: h2; Actual: h3]

See detailed reports in MegaLinter artifacts

Your project could benefit from a custom flavor, which would allow you to run only the linters you need, and thus improve runtime performances. (Skip this info by defining FLAVOR_SUGGESTIONS: false)

  • Documentation: Custom Flavors
  • Command: npx [email protected] --custom-flavor-setup --custom-flavor-linters ACTION_ACTIONLINT,JSON_JSONLINT,JSON_V8R,JSON_PRETTIER,MARKDOWN_MARKDOWNLINT,MARKDOWN_MARKDOWN_TABLE_FORMATTER,REPOSITORY_CHECKOV,REPOSITORY_GIT_DIFF,REPOSITORY_SECRETLINT,REPOSITORY_TRUFFLEHOG,SPELL_LYCHEE,SPELL_CODESPELL,YAML_PRETTIER,YAML_YAMLLINT,YAML_V8R

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

@jklare jklare force-pushed the jk-cluster-layout branch from deb849f to 2aedbfe Compare April 1, 2026 14:06
Adds docs/concepts/cluster-layout.md covering:
- Recommended node counts per type (Manager, Control, Network, Storage, Compute)
- Control node quorum requirements (MariaDB Galera, RabbitMQ, etcd)
- Ceph replication factor implications for 3-node vs 5-node storage clusters
- ceph-ansible variables (openstack_pool_default_size, cephfs_pool_default_size,
  rgw_pool_default_size) to adjust pool replication in cfg-cookiecutter repos
  when reducing the number of storage nodes
- Network node redundancy considerations

AI-assisted: Claude Code (claude-sonnet-4-6)

Fixes: #953
Signed-off-by: Jan Klare <[email protected]>
@jklare jklare force-pushed the jk-cluster-layout branch from 2aedbfe to 48499a2 Compare April 1, 2026 14:29
@jklare jklare marked this pull request as ready for review April 1, 2026 14:30
@jklare jklare requested a review from ideaship April 1, 2026 14:30
Comment on lines +9 to +10
These numbers are not arbitrary — they are driven by quorum requirements of the
distributed systems running inside the cluster.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth mentioning what quorum means (majority of configured nodes, not majority of surviving nodes).

| Network | 1 | 2 | 2 for redundancy; required for safety zone separation |
| Storage | 1 | 5 | 5 allows self-healing after a node loss with replication factor 3 |
| Compute | 1 | 3+ | Depends on workload requirements |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to mention that nodes can take multiple roles (to avoid the impression that the recommended minimum size is 1+3+2+5+3 nodes).

Copy link
Copy Markdown
Contributor Author

@jklare jklare Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add an additional section for the HCI case

Comment on lines +66 to +90
The variables below are set in `environments/ceph/configuration.yml` and are
consumed by [ceph-ansible](https://github.com/ceph/ceph-ansible), which creates
the Ceph pools during deployment. `size` is the target number of replicas and
should match the number of available storage nodes. `min_size` is the minimum
number of replicas required for I/O to be allowed and should also be set to the
number of available storage nodes so that Ceph does not serve degraded data.
Their defaults are defined in
[osism/defaults — all/099-ceph.yml](https://github.com/osism/defaults/blob/main/all/099-ceph.yml).

```yaml
# environments/ceph/configuration.yml — 2-node cluster (not recommended for production)

openstack_pool_default_size: 2
openstack_pool_default_min_size: 2

cephfs_pool_default_size: 2
cephfs_pool_default_min_size: 2

rgw_pool_default_size: 2
```

The cfg-cookiecutter template also sets `osd pool default size: 3` inside
`ceph_conf_overrides` in `environments/ceph/configuration.yml`. This is the
Ceph daemon-level default for any pools not explicitly managed by OSISM and
should be updated to match when reducing the node count.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In summary, the storage part seems too detailed for an overview but terse for guiding a user configuring their first cluster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will remove this section for now and just mention, that the default ceph configuration assumes at least 3 nodes and will not work for smaller clusters. The detailed description for how to configure a 1- or 2-node cluster can go elsewhere.

# environments/ceph/configuration.yml — 2-node cluster (not recommended for production)

openstack_pool_default_size: 2
openstack_pool_default_min_size: 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In osism/defaults/all/099-ceph.yml, openstack_pool_default_size is 3 and openstack_pool_default_min_size is 0. It is not obvious why openstack_pool_default_min_size should be 2 for a smaller cluster.

[osism/defaults — all/099-ceph.yml](https://github.com/osism/defaults/blob/main/all/099-ceph.yml).

```yaml
# environments/ceph/configuration.yml — 2-node cluster (not recommended for production)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables below don't exist in cfg-cookiecutter/{{cookiecutter.project_name}}/environments/ceph/configuration.yml. How would the user figure out where to put them? Names such as cephfs_pool_default_size suggest that they should be put somewhere under ceph_conf_overrides, but the yaml snipped shows no indentation.

Comment on lines +87 to +88
The cfg-cookiecutter template also sets `osd pool default size: 3` inside
`ceph_conf_overrides` in `environments/ceph/configuration.yml`. This is the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ceph_conf_overrides does exist in cfg-cookiecutter and it's set to 3. As a first-time reader, I would assume that this value works for the recommended storage cluster size of 5. But what does "should be updated to match when reducing the node count" mean then? From that one might infer that the default value of 3 is actually for a storage cluster with 3 nodes, but that amounts to guessing at this point.

Maybe the table at the top of this page should contain a column "OSISM default" in addition to Minimum and Recommended, but I'm not sure that would be sufficient to avoid confusion.

sidebar_position: 25
---

# Cluster Layout
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the title, I would expect to find information included on how the nodes should be connected.

As for the sizing information actually in this page, as a reader I would expect to learn how many nodes I need to build a testbed, and/or how many I need for a small production environment. There is a valuable background information in here, but I wouldn't feel confident sizing my cluster based on it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about a networking section, but that was too complex for the fist shot. I will add this in the next iteration.

I will add some generic guidelines what a minimal production one looks like and link to the testbed section for that one.

@jklare jklare marked this pull request as draft April 2, 2026 12:52
* removed section about detailed ceph config for cluster with < 3 nodes
* added sections describing minimal, testbed, production and recommended
  cluster layouts
* some reformatting for the line length

AI-assisted: Claude Code (claude-sonnet-4-6)

Signed-off-by: Jan Klare <[email protected]>
@jklare
Copy link
Copy Markdown
Contributor Author

jklare commented Apr 2, 2026

ready for another round of review @ideaship

@jklare
Copy link
Copy Markdown
Contributor Author

jklare commented Apr 2, 2026

@berendt @osfrickler if you have a minute, it would be good to get your feedback here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants