Skip to content

Added WRED with affected Leaf/LC/FM model check#379

Open
Priyanka-Patil14 wants to merge 3 commits intodatacenter:v4.1.0-devfrom
Priyanka-Patil14:bugfix/CSCwt50713
Open

Added WRED with affected Leaf/LC/FM model check#379
Priyanka-Patil14 wants to merge 3 commits intodatacenter:v4.1.0-devfrom
Priyanka-Patil14:bugfix/CSCwt50713

Conversation

@Priyanka-Patil14
Copy link
Copy Markdown

Summary

Adds a new pre-upgrade validation check to detect fabric nodes at risk due to CSCwt50713, where WRED-enabled QoS combined with specific Leaf/LC/FM hardware models can cause N9504 spine crashes after upgrading to affected ACI releases.

Detection Logic

Three gates must all be true to trigger a FAIL:

  1. Version Gate – Target version is in the affected range:

    • ACI 6.1(x) older than 6.1(6a)
    • ACI 6.2(x) older than 6.2(2a)
  2. Feature Gate – WRED is enabled (qosCong.algo = wred)

  3. Hardware Gate – Any of the following affected models are present:

    • Leaf: N9K-C9236C, N9K-C92300YC, N9K-C9272Q, N9K-C92304QC
    • LC: N9K-C92304QC
    • FM: N9K-C9504-FM-E, N9K-C9508-FM-E, N9K-C9516-FM-E

Testing

  • 5 unit test cases added under tests/checks/wred_affected_model_check/
  • All 5 passed
  • Validated on live fabric (fab3-apic1): confirmed FAIL_O with real hit on node 201 (FAB3-S1, N9K-C9504-FM-E)

@Priyanka-Patil14
Copy link
Copy Markdown
Author

WredCheck_APIC_Output_logs.txt
WredCheck_Pytest_Logs.txt

Uploaded the test logs.

Copy link
Copy Markdown

@Harinadh-Saladi Harinadh-Saladi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls address the comments given and also Pls add the bug details in validations.md file. It's missing.
Pls execute the script on Fab3 and share PASS, FAIL and NA logs. Will review it.


@pytest.mark.parametrize(
"tversion, fabric_nodes, icurl_outputs, expected_result, expected_data",
[
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add the comments for each test cases to understand what test case is doing, then will review.

Copy link
Copy Markdown
Author

@Priyanka-Patil14 Priyanka-Patil14 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Added comments to all the test cases.

"tversion, fabric_nodes, icurl_outputs, expected_result, expected_data",
[
(
None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add the json files and read the json files for each test case and provide the test result accordingly instead of hard-coding here. Pls follow the existing structure.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Replaced all hardcoded data with JSON fixture files

Comment thread aci-preupgrade-validation-script.py Outdated
headers = ["Node ID", "Node Name", "Source", "Model"]
data = []
recommended_action = (
'Detected affected node(s) with WRED enabled. '
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check appropriate recommended action for this issue and add in a single line

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread aci-preupgrade-validation-script.py Outdated
'Detected affected node(s) with WRED enabled. '
'Review software fix options and engage TAC.'
)
doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc url is incorrect, pls add right url

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Changed doc url to point to the GitHub docs validation

)
doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713'

if not tversion:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add tversion missing check, if tversion is not given script will prompt for tversion to provide the input.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the existing pattern used across the codebase. It handles the debug mode case where a user may run a single check without providing a target version, and the check needs to handle that gracefully instead of throwing an exception. Keeping it for consistency.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the pattern is consistent across the script, that would be old code. As I cited earlier, when the tversion is not provided as an input, script will prompt to provide the input, there won't be any exception. This change needs to be incorporated across the script. Pls address it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version check can be removed. tversion is not a optional

Comment thread aci-preupgrade-validation-script.py Outdated
wred_enabled = False
for cong in qosCong:
algo = cong.get('qosCong', {}).get('attributes', {}).get('algo', '')
if algo.lower() == 'wred':
Copy link
Copy Markdown

@Harinadh-Saladi Harinadh-Saladi Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see the value of the attribute algo is in lower case from moquery output. So ,no need to convert it into lower case and validate.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread aci-preupgrade-validation-script.py Outdated
algo = cong.get('qosCong', {}).get('attributes', {}).get('algo', '')
if algo.lower() == 'wred':
wred_enabled = True
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If wred_enabled flag is True then you're coming out of the loop. What if we have multiple objects? then the loop will not be iterated for other objects. Can you check the code and validate with multiple wred enabled objects and share the logs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the break comment, I validated it with 4 objects where WRED was at position 3, The loop exits after finding wred at position 3 and skips the 4th object, but the result is still correctly FAIL_O. The break is intentional here since we just need to know if WRED is enabled anywhere once we find one wred object the answer is yes, so there is no need to continue. I have also added a test case to cover this scenario.

Please find the pytest logs attached.
wred_break_validation.txt

Comment thread aci-preupgrade-validation-script.py Outdated
}

def is_affected_model(model):
m = (model or '').upper()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls keep the meaningful variable name instead of letter 'm' and why are we converting it into upper case here? We can chnage the case to upper if we are not getting, All the hardware models we're getting in upper case. Pls check if we are getting in lower case anywhere and convert if required.

if attr.get('id'):
node_name_map[attr.get('id')] = attr.get('name', '')

impacted = set()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls use generic variable names as per the structure of the script.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Replaced generic variable names to match the script's conventions.

Comment thread aci-preupgrade-validation-script.py Outdated
model = attr.get('model', '')
if not is_affected_model(model):
continue
dn = attr.get('dn', '')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see dn extraction and node_regex parsing logic is duplicated in both LC and FM loops. Can you implement with a small helper, so that parsing can be implemented once and reused.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@Priyanka-Patil14
Copy link
Copy Markdown
Author

Pls address the comments given and also Pls add the bug details in validations.md file. It's missing. Pls execute the script on Fab3 and share PASS, FAIL and NA logs. Will review it.

WRED_PASS:FAIL:NA_APIC_Logs.txt

Please find the attached logs. Executed on fab3 for PASS, FAIL and NA scenario.

Comment thread docs/docs/validations.md Outdated
Comment thread aci-preupgrade-validation-script.py Outdated
return Result(result=NA, msg=VER_NOT_AFFECTED)

affected_models = {
'N9K-C9236C',
Copy link
Copy Markdown

@lovkeshsharma702 lovkeshsharma702 Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N9K-C9xxx not supported in ACi mode. Please validate all model before updating here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Validated all models.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in dup bug CSCwt09384, N9K-C9xxx models TS collection on impacted model on gx2,H2,H1 model leaf.

  • N9K-C9364D-GX2A
  • N9K-C9332D-GX2B
  • N9K-C9348D-GX2A
  • N9K-C9332D-H2R
  • N9K-C9364C-H1
  • N9K-C93400LD-H1

Don't want this model check added? lovkesh please confirm.

Comment thread aci-preupgrade-validation-script.py Outdated
if is_affected_model(model):
impacted.add((node['fabricNode']['attributes']['id'], node['fabricNode']['attributes']['name'], 'Leaf', model))

# LC model gate
Copy link
Copy Markdown

@lovkeshsharma702 lovkeshsharma702 Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since LC, non-moduler not applicable. You can focus on FM module only.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Removed the leaf gate and LC gate entirely. The check now focuses only on FM models

Comment thread aci-preupgrade-validation-script.py Outdated

impacted = set()

def add_if_affected(obj_class, obj_list, source_label):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change logic to check only MOduler spine < version, FM model

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Logic now checks only FM models

Copy link
Copy Markdown

@lovkeshsharma702 lovkeshsharma702 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please work on all comments.

Comment thread docs/docs/validations.md

Due to [CSCwt50713][67], when WRED (Weighted Random Early Detection) is enabled and specific Fabric Module (FM) hardware models are present in the fabric, the spine switch may crash after moving to an affected ACI release in the 6.1(x) or 6.2(x) range.

Affected versions: ACI 6.1(x) up to and including 6.1(5e), and ACI 6.2(x) up to and including 6.2(1g).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct the statement . Impacted aci version 6.1(5e) and below, and 6.2(1g).

result = PASS
headers = ["Node ID", "Node Name", "Source", "Model"]
data = []
recommended_action = 'Disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Disable WRED in fabric or upgrade to release > 6.1(5e), 6.2(1g)"


impacted = set()

# FM model gate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use copilot to align this code as per structure and styling of whole script.

Copy link
Copy Markdown

@lovkeshsharma702 lovkeshsharma702 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

work on comments

node_key = node_id
return (node_key, row[2], row[3])

data = [list(row) for row in sorted(impacted, key=sort_key)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need this sort operation and use data list alone instead use impacted and data both.

Copy link
Copy Markdown

@Harinadh-Saladi Harinadh-Saladi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls address the comments. If there is any different understanding with me in the test results or technical aspects, will discuss with team and address after getting the confirmation.

Comment thread docs/docs/validations.md

Affected hardware models: N9K-C9504-FM-E, N9K-C9508-FM-E, N9K-C9516-FM-E.

To avoid this issue, disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls replace "move" with "upgrade".

)
doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713'

if not tversion:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the pattern is consistent across the script, that would be old code. As I cited earlier, when the tversion is not provided as an input, script will prompt to provide the input, there won't be any exception. This change needs to be incorporated across the script. Pls address it.

@pytest.mark.parametrize(
"tversion, fabric_nodes, icurl_outputs, expected_result, expected_data",
[
# Case 1: No target version provided (-t flag missing).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this case, as it's not required. Script will prompt to provide the tversion when input is not provided.

node_id = node['fabricNode']['attributes']['id']
node_name_map[node_id] = node['fabricNode']['attributes']['name']

impacted = set()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update variable name "impacted" with "affected_nodes".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls remove this file as we're focusing only on FC models.

# Version 6.2(1f) is in affected range, WRED is enabled, FM model N9K-C9508-FM-E is affected.
# Expected: FAIL_O with node 1001 reported under Source=FM.
(
"6.2(1f)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update the version with 6.2(1g). 6.2(1f) is unavailable.

{
eqptFC_api: read_data(dir, "eqptFC_empty.json"),
},
script.PASS,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update the test result as NA. Even though version is affected but the model is unaffected, since this issue is specific to the model.

},
script.PASS,
[],
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add the test cases for mixed scenarios, if there are multiple objects with one affected model and others unaffected with wred enabled and disabled combinations.

result = PASS
headers = ["Node ID", "Node Name", "Source", "Model"]
data = []
recommended_action = 'Disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls replace "move" with "upgrade"

impacted.add((node_id, node_name_map.get(node_id, ''), 'FM', model))

if not impacted:
return Result(result=PASS, msg='No affected hardware models found. Skipping.')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check the result. I think it should be NA, as this issue is specific to the model after version check. Result will be PASS only if there is affected model and wred is disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants