Added WRED with affected Leaf/LC/FM model check by Priyanka-Patil14 · Pull Request #379 · datacenter/ACI-Pre-Upgrade-Validation-Script

Priyanka-Patil14 · 2026-04-09T14:42:42Z

Summary

Adds a new pre-upgrade validation check to detect fabric nodes at risk due to CSCwt50713, where WRED-enabled QoS combined with specific Leaf/LC/FM hardware models can cause N9504 spine crashes after upgrading to affected ACI releases.

Detection Logic

Three gates must all be true to trigger a FAIL:

Version Gate – Target version is in the affected range:
- ACI 6.1(x) older than 6.1(6a)
- ACI 6.2(x) older than 6.2(2a)
Feature Gate – WRED is enabled (qosCong.algo = wred)
Hardware Gate – Any of the following affected models are present:
- Leaf: N9K-C9236C, N9K-C92300YC, N9K-C9272Q, N9K-C92304QC
- LC: N9K-C92304QC
- FM: N9K-C9504-FM-E, N9K-C9508-FM-E, N9K-C9516-FM-E

Testing

5 unit test cases added under tests/checks/wred_affected_model_check/
All 5 passed
Validated on live fabric (fab3-apic1): confirmed FAIL_O with real hit on node 201 (FAB3-S1, N9K-C9504-FM-E)

Priyanka-Patil14 · 2026-04-10T06:13:31Z

WredCheck_APIC_Output_logs.txt
WredCheck_Pytest_Logs.txt

Uploaded the test logs.

Harinadh-Saladi

Pls address the comments given and also Pls add the bug details in validations.md file. It's missing.
Pls execute the script on Fab3 and share PASS, FAIL and NA logs. Will review it.

Harinadh-Saladi · 2026-04-10T12:10:15Z

+
+@pytest.mark.parametrize(
+    "tversion, fabric_nodes, icurl_outputs, expected_result, expected_data",
+    [


Pls add the comments for each test cases to understand what test case is doing, then will review.

Updated. Added comments to all the test cases.

Harinadh-Saladi · 2026-04-10T12:10:19Z

+    "tversion, fabric_nodes, icurl_outputs, expected_result, expected_data",
+    [
+        (
+            None,


Pls add the json files and read the json files for each test case and provide the test result accordingly instead of hard-coding here. Pls follow the existing structure.

Updated. Replaced all hardcoded data with JSON fixture files

Harinadh-Saladi · 2026-04-10T12:11:56Z

+    headers = ["Node ID", "Node Name", "Source", "Model"]
+    data = []
+    recommended_action = (
+        'Detected affected node(s) with WRED enabled. '


Pls check appropriate recommended action for this issue and add in a single line

Harinadh-Saladi · 2026-04-10T12:12:16Z

+        'Detected affected node(s) with WRED enabled. '
+        'Review software fix options and engage TAC.'
+    )
+    doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713'


This doc url is incorrect, pls add right url

Updated. Changed doc url to point to the GitHub docs validation

Harinadh-Saladi · 2026-04-10T12:12:21Z

+    )
+    doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713'
+
+    if not tversion:


No need to add tversion missing check, if tversion is not given script will prompt for tversion to provide the input.

This is consistent with the existing pattern used across the codebase. It handles the debug mode case where a user may run a single check without providing a target version, and the check needs to handle that gracefully instead of throwing an exception. Keeping it for consistency.

Though the pattern is consistent across the script, that would be old code. As I cited earlier, when the tversion is not provided as an input, script will prompt to provide the input, there won't be any exception. This change needs to be incorporated across the script. Pls address it.

This version check can be removed. tversion is not a optional

Harinadh-Saladi · 2026-04-10T12:12:36Z

+    wred_enabled = False
+    for cong in qosCong:
+        algo = cong.get('qosCong', {}).get('attributes', {}).get('algo', '')
+        if algo.lower() == 'wred':


I could see the value of the attribute algo is in lower case from moquery output. So ,no need to convert it into lower case and validate.

Harinadh-Saladi · 2026-04-10T12:12:38Z

+        algo = cong.get('qosCong', {}).get('attributes', {}).get('algo', '')
+        if algo.lower() == 'wred':
+            wred_enabled = True
+            break


If wred_enabled flag is True then you're coming out of the loop. What if we have multiple objects? then the loop will not be iterated for other objects. Can you check the code and validate with multiple wred enabled objects and share the logs

For the break comment, I validated it with 4 objects where WRED was at position 3, The loop exits after finding wred at position 3 and skips the 4th object, but the result is still correctly FAIL_O. The break is intentional here since we just need to know if WRED is enabled anywhere once we find one wred object the answer is yes, so there is no need to continue. I have also added a test case to cover this scenario.

Please find the pytest logs attached.
wred_break_validation.txt

Harinadh-Saladi · 2026-04-10T12:12:41Z

+    }
+
+    def is_affected_model(model):
+        m = (model or '').upper()


Pls keep the meaningful variable name instead of letter 'm' and why are we converting it into upper case here? We can chnage the case to upper if we are not getting, All the hardware models we're getting in upper case. Pls check if we are getting in lower case anywhere and convert if required.

Harinadh-Saladi · 2026-04-10T12:12:44Z

+        if attr.get('id'):
+            node_name_map[attr.get('id')] = attr.get('name', '')
+
+    impacted = set()


Pls use generic variable names as per the structure of the script.

Updated. Replaced generic variable names to match the script's conventions.

Harinadh-Saladi · 2026-04-10T12:12:48Z

+        model = attr.get('model', '')
+        if not is_affected_model(model):
+            continue
+        dn = attr.get('dn', '')


I could see dn extraction and node_regex parsing logic is duplicated in both LC and FM loops. Can you implement with a small helper, so that parsing can be implemented once and reused.

Priyanka-Patil14 · 2026-04-13T07:30:29Z

Pls address the comments given and also Pls add the bug details in validations.md file. It's missing. Pls execute the script on Fab3 and share PASS, FAIL and NA logs. Will review it.

WRED_PASS:FAIL:NA_APIC_Logs.txt

Please find the attached logs. Executed on fab3 for PASS, FAIL and NA scenario.

lovkeshsharma702 · 2026-04-15T10:04:53Z

+        return Result(result=NA, msg=VER_NOT_AFFECTED)
+
+    affected_models = {
+        'N9K-C9236C',


N9K-C9xxx not supported in ACi mode. Please validate all model before updating here.

Updated. Validated all models.

in dup bug CSCwt09384, N9K-C9xxx models TS collection on impacted model on gx2,H2,H1 model leaf.

N9K-C9364D-GX2A

N9K-C9332D-GX2B

N9K-C9348D-GX2A

N9K-C9332D-H2R

N9K-C9364C-H1

N9K-C93400LD-H1

Don't want this model check added? lovkesh please confirm.

lovkeshsharma702 · 2026-04-15T10:05:51Z

+        if is_affected_model(model):
+            impacted.add((node['fabricNode']['attributes']['id'], node['fabricNode']['attributes']['name'], 'Leaf', model))
+
+    # LC model gate


since LC, non-moduler not applicable. You can focus on FM module only.

Updated. Removed the leaf gate and LC gate entirely. The check now focuses only on FM models

lovkeshsharma702 · 2026-04-15T10:10:52Z

+
+    impacted = set()
+
+    def add_if_affected(obj_class, obj_list, source_label):


Change logic to check only MOduler spine < version, FM model

Updated. Logic now checks only FM models

lovkeshsharma702

please work on all comments.

lovkeshsharma702 · 2026-04-16T04:47:25Z

+
+Due to [CSCwt50713][67], when WRED (Weighted Random Early Detection) is enabled and specific Fabric Module (FM) hardware models are present in the fabric, the spine switch may crash after moving to an affected ACI release in the 6.1(x) or 6.2(x) range.
+
+Affected versions: ACI 6.1(x) up to and including 6.1(5e), and ACI 6.2(x) up to and including 6.2(1g).


correct the statement . Impacted aci version 6.1(5e) and below, and 6.2(1g).

lovkeshsharma702 · 2026-04-16T04:54:15Z

+    result = PASS
+    headers = ["Node ID", "Node Name", "Source", "Model"]
+    data = []
+    recommended_action = 'Disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.'


'Disable WRED in fabric or upgrade to release > 6.1(5e), 6.2(1g)"

lovkeshsharma702 · 2026-04-16T05:05:50Z

+
+    impacted = set()
+
+    # FM model gate


can you use copilot to align this code as per structure and styling of whole script.

lovkeshsharma702

work on comments

muthu-ku · 2026-04-16T05:18:30Z

+            node_key = node_id
+        return (node_key, row[2], row[3])
+
+    data = [list(row) for row in sorted(impacted, key=sort_key)]


No need this sort operation and use data list alone instead use impacted and data both.

Harinadh-Saladi

Pls address the comments. If there is any different understanding with me in the test results or technical aspects, will discuss with team and address after getting the confirmation.

Harinadh-Saladi · 2026-04-16T04:22:47Z

+
+Affected hardware models: N9K-C9504-FM-E, N9K-C9508-FM-E, N9K-C9516-FM-E.
+
+To avoid this issue, disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.


Pls replace "move" with "upgrade".

Harinadh-Saladi · 2026-04-16T04:31:02Z

+    )
+    doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713'
+
+    if not tversion:


Though the pattern is consistent across the script, that would be old code. As I cited earlier, when the tversion is not provided as an input, script will prompt to provide the input, there won't be any exception. This change needs to be incorporated across the script. Pls address it.

Harinadh-Saladi · 2026-04-16T04:53:24Z

+@pytest.mark.parametrize(
+    "tversion, fabric_nodes, icurl_outputs, expected_result, expected_data",
+    [
+        # Case 1: No target version provided (-t flag missing).


You can remove this case, as it's not required. Script will prompt to provide the tversion when input is not provided.

Harinadh-Saladi · 2026-04-16T04:59:36Z

+        node_id = node['fabricNode']['attributes']['id']
+        node_name_map[node_id] = node['fabricNode']['attributes']['name']
+
+    impacted = set()


Pls update variable name "impacted" with "affected_nodes".

Harinadh-Saladi · 2026-04-16T05:02:31Z

Pls remove this file as we're focusing only on FC models.

Harinadh-Saladi · 2026-04-16T05:38:24Z

+        # Version 6.2(1f) is in affected range, WRED is enabled, FM model N9K-C9508-FM-E is affected.
+        # Expected: FAIL_O with node 1001 reported under Source=FM.
+        (
+            "6.2(1f)",


Pls update the version with 6.2(1g). 6.2(1f) is unavailable.

Harinadh-Saladi · 2026-04-16T05:50:20Z

+            {
+                eqptFC_api: read_data(dir, "eqptFC_empty.json"),
+            },
+            script.PASS,


Pls update the test result as NA. Even though version is affected but the model is unaffected, since this issue is specific to the model.

Harinadh-Saladi · 2026-04-16T05:52:57Z

+            },
+            script.PASS,
+            [],
+        ),


Pls add the test cases for mixed scenarios, if there are multiple objects with one affected model and others unaffected with wred enabled and disabled combinations.

Harinadh-Saladi · 2026-04-16T05:53:29Z

+    result = PASS
+    headers = ["Node ID", "Node Name", "Source", "Model"]
+    data = []
+    recommended_action = 'Disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.'


Pls replace "move" with "upgrade"

Harinadh-Saladi · 2026-04-16T05:58:56Z

+        impacted.add((node_id, node_name_map.get(node_id, ''), 'FM', model))
+
+    if not impacted:
+        return Result(result=PASS, msg='No affected hardware models found. Skipping.')


Pls check the result. I think it should be NA, as this issue is specific to the model after version check. Result will be PASS only if there is affected model and wred is disabled.

Added WRED with affected Leaf/LC/FM model check

96dbf31

Harinadh-Saladi reviewed Apr 10, 2026

View reviewed changes

Addressed PR review comments

8366239

lovkeshsharma702 reviewed Apr 15, 2026

View reviewed changes

Comment thread docs/docs/validations.md Outdated

lovkeshsharma702 reviewed Apr 15, 2026

View reviewed changes

Addressed PR review comments

b2f324e

lovkeshsharma702 reviewed Apr 16, 2026

View reviewed changes

muthu-ku reviewed Apr 16, 2026

View reviewed changes

Harinadh-Saladi reviewed Apr 16, 2026

View reviewed changes


		impacted = set()

		def add_if_affected(obj_class, obj_list, source_label):


		Due to [CSCwt50713][67], when WRED (Weighted Random Early Detection) is enabled and specific Fabric Module (FM) hardware models are present in the fabric, the spine switch may crash after moving to an affected ACI release in the 6.1(x) or 6.2(x) range.

		Affected versions: ACI 6.1(x) up to and including 6.1(5e), and ACI 6.2(x) up to and including 6.2(1g).


		Affected hardware models: N9K-C9504-FM-E, N9K-C9508-FM-E, N9K-C9516-FM-E.

		To avoid this issue, disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.

Conversation

Priyanka-Patil14 commented Apr 9, 2026

Summary

Detection Logic

Testing

Uh oh!

Priyanka-Patil14 commented Apr 10, 2026

Uh oh!

Harinadh-Saladi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Priyanka-Patil14 Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Harinadh-Saladi Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Priyanka-Patil14 commented Apr 13, 2026

Uh oh!

Uh oh!

lovkeshsharma702 Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lovkeshsharma702 Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lovkeshsharma702 left a comment

Choose a reason for hiding this comment

Priyanka-Patil14 Apr 13, 2026 •

edited

Loading

Harinadh-Saladi Apr 10, 2026 •

edited

Loading

lovkeshsharma702 Apr 15, 2026 •

edited

Loading

lovkeshsharma702 Apr 15, 2026 •

edited

Loading