Skip to content

DAOS-18768 cart: Change default mrc settings#17948

Closed
frostedcmos wants to merge 5 commits intomasterfrom
frostedcmos/DAOS-18768
Closed

DAOS-18768 cart: Change default mrc settings#17948
frostedcmos wants to merge 5 commits intomasterfrom
frostedcmos/DAOS-18768

Conversation

@frostedcmos
Copy link
Copy Markdown
Contributor

  • Disable mrc for all providers on servers, leave untouched on clients
  • Include UCX_RCACHE_ENABLE=n setting when disabling MRC
  • Move mrc code from per-provider setup to data_init()

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

- Disable mrc for all providers on servers, leave untouched on clients
- Include UCX_RCACHE_ENABLE=n setting when disabling MRC
- Move mrc code from per-provider setup to data_init()

Signed-off-by: Alexander A Oganezov <[email protected]>
Comment thread src/cart/crt_init.c
d_setenv("FI_UNIVERSE_SIZE", "2048", 1);

/* Disable MRC on servers and enable on clients by default */
mrc_enable = server ? 0 : 1;
Copy link
Copy Markdown
Collaborator

@soumagne soumagne Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mrc_enable = server ? 0 : 1;
mrc_enable = !server;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe it is clearer to keep the original and spell out the two cases explicitly with "?"

Comment thread src/cart/crt_init.c
mrc_enable = server ? 0 : 1;
crt_env_get(CRT_MRC_ENABLE, &mrc_enable);

if (mrc_enable == 0) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (mrc_enable == 0) {
if (!mrc_enable) {

Comment thread src/cart/crt_init.c
unsigned int mrecv_buf = CRT_HG_MRECV_BUF;
unsigned int mrecv_buf_copy = 0; /* buf copy disabled by default */
char *swim_traffic_class = NULL;
uint32_t mrc_enable = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make it a boolean ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be uint32_t for crt_env stuff. server is also currently defined as int, so changing all to bools would be messier in this pr

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Ticket title is 'Rank excluded due to: corrupted double-linked list after rebuild was triggered'
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-18768

soumagne
soumagne previously approved these changes Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@Michael-Hennecke Michael-Hennecke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex - is this worth a 2.6 backport (after the 2.8 backport)?

Comment thread src/cart/crt_init.c
d_setenv("FI_UNIVERSE_SIZE", "2048", 1);

/* Disable MRC on servers and enable on clients by default */
mrc_enable = server ? 0 : 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe it is clearer to keep the original and spell out the two cases explicitly with "?"

@frostedcmos
Copy link
Copy Markdown
Contributor Author

Thanks Alex - is this worth a 2.6 backport (after the 2.8 backport)?

I am not sure on this. the primary fix here is to add ucx disabling part, and i dont think we have any ucx users on 2.6. But if someone else decides it is needed for 2.6 then a backport can be easily made

@daosbuild3
Copy link
Copy Markdown
Collaborator

of the test app

Signed-off-by: Alexander A Oganezov <[email protected]>
@frostedcmos frostedcmos dismissed stale reviews from Michael-Hennecke and soumagne via 4e48adc April 14, 2026 00:08
@frostedcmos frostedcmos requested review from a team as code owners April 14, 2026 00:08
Copy link
Copy Markdown
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17948/14/execution/node/1271/log

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17948/14/execution/node/1281/log

@frostedcmos
Copy link
Copy Markdown
Contributor Author

Closing in favor of #18020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants