Added WRED with affected Leaf/LC/FM model check#379
Added WRED with affected Leaf/LC/FM model check#379Priyanka-Patil14 wants to merge 3 commits intodatacenter:v4.1.0-devfrom
Conversation
|
WredCheck_APIC_Output_logs.txt Uploaded the test logs. |
Harinadh-Saladi
left a comment
There was a problem hiding this comment.
Pls address the comments given and also Pls add the bug details in validations.md file. It's missing.
Pls execute the script on Fab3 and share PASS, FAIL and NA logs. Will review it.
|
|
||
| @pytest.mark.parametrize( | ||
| "tversion, fabric_nodes, icurl_outputs, expected_result, expected_data", | ||
| [ |
There was a problem hiding this comment.
Pls add the comments for each test cases to understand what test case is doing, then will review.
There was a problem hiding this comment.
Updated. Added comments to all the test cases.
| "tversion, fabric_nodes, icurl_outputs, expected_result, expected_data", | ||
| [ | ||
| ( | ||
| None, |
There was a problem hiding this comment.
Pls add the json files and read the json files for each test case and provide the test result accordingly instead of hard-coding here. Pls follow the existing structure.
There was a problem hiding this comment.
Updated. Replaced all hardcoded data with JSON fixture files
| headers = ["Node ID", "Node Name", "Source", "Model"] | ||
| data = [] | ||
| recommended_action = ( | ||
| 'Detected affected node(s) with WRED enabled. ' |
There was a problem hiding this comment.
Pls check appropriate recommended action for this issue and add in a single line
| 'Detected affected node(s) with WRED enabled. ' | ||
| 'Review software fix options and engage TAC.' | ||
| ) | ||
| doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713' |
There was a problem hiding this comment.
This doc url is incorrect, pls add right url
There was a problem hiding this comment.
Updated. Changed doc url to point to the GitHub docs validation
| ) | ||
| doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713' | ||
|
|
||
| if not tversion: |
There was a problem hiding this comment.
No need to add tversion missing check, if tversion is not given script will prompt for tversion to provide the input.
There was a problem hiding this comment.
This is consistent with the existing pattern used across the codebase. It handles the debug mode case where a user may run a single check without providing a target version, and the check needs to handle that gracefully instead of throwing an exception. Keeping it for consistency.
There was a problem hiding this comment.
Though the pattern is consistent across the script, that would be old code. As I cited earlier, when the tversion is not provided as an input, script will prompt to provide the input, there won't be any exception. This change needs to be incorporated across the script. Pls address it.
There was a problem hiding this comment.
This version check can be removed. tversion is not a optional
| wred_enabled = False | ||
| for cong in qosCong: | ||
| algo = cong.get('qosCong', {}).get('attributes', {}).get('algo', '') | ||
| if algo.lower() == 'wred': |
There was a problem hiding this comment.
I could see the value of the attribute algo is in lower case from moquery output. So ,no need to convert it into lower case and validate.
| algo = cong.get('qosCong', {}).get('attributes', {}).get('algo', '') | ||
| if algo.lower() == 'wred': | ||
| wred_enabled = True | ||
| break |
There was a problem hiding this comment.
If wred_enabled flag is True then you're coming out of the loop. What if we have multiple objects? then the loop will not be iterated for other objects. Can you check the code and validate with multiple wred enabled objects and share the logs
There was a problem hiding this comment.
For the break comment, I validated it with 4 objects where WRED was at position 3, The loop exits after finding wred at position 3 and skips the 4th object, but the result is still correctly FAIL_O. The break is intentional here since we just need to know if WRED is enabled anywhere once we find one wred object the answer is yes, so there is no need to continue. I have also added a test case to cover this scenario.
Please find the pytest logs attached.
wred_break_validation.txt
| } | ||
|
|
||
| def is_affected_model(model): | ||
| m = (model or '').upper() |
There was a problem hiding this comment.
Pls keep the meaningful variable name instead of letter 'm' and why are we converting it into upper case here? We can chnage the case to upper if we are not getting, All the hardware models we're getting in upper case. Pls check if we are getting in lower case anywhere and convert if required.
| if attr.get('id'): | ||
| node_name_map[attr.get('id')] = attr.get('name', '') | ||
|
|
||
| impacted = set() |
There was a problem hiding this comment.
Pls use generic variable names as per the structure of the script.
There was a problem hiding this comment.
Updated. Replaced generic variable names to match the script's conventions.
| model = attr.get('model', '') | ||
| if not is_affected_model(model): | ||
| continue | ||
| dn = attr.get('dn', '') |
There was a problem hiding this comment.
I could see dn extraction and node_regex parsing logic is duplicated in both LC and FM loops. Can you implement with a small helper, so that parsing can be implemented once and reused.
WRED_PASS:FAIL:NA_APIC_Logs.txt Please find the attached logs. Executed on fab3 for PASS, FAIL and NA scenario. |
| return Result(result=NA, msg=VER_NOT_AFFECTED) | ||
|
|
||
| affected_models = { | ||
| 'N9K-C9236C', |
There was a problem hiding this comment.
N9K-C9xxx not supported in ACi mode. Please validate all model before updating here.
There was a problem hiding this comment.
Updated. Validated all models.
There was a problem hiding this comment.
in dup bug CSCwt09384, N9K-C9xxx models TS collection on impacted model on gx2,H2,H1 model leaf.
- N9K-C9364D-GX2A
- N9K-C9332D-GX2B
- N9K-C9348D-GX2A
- N9K-C9332D-H2R
- N9K-C9364C-H1
- N9K-C93400LD-H1
Don't want this model check added? lovkesh please confirm.
| if is_affected_model(model): | ||
| impacted.add((node['fabricNode']['attributes']['id'], node['fabricNode']['attributes']['name'], 'Leaf', model)) | ||
|
|
||
| # LC model gate |
There was a problem hiding this comment.
since LC, non-moduler not applicable. You can focus on FM module only.
There was a problem hiding this comment.
Updated. Removed the leaf gate and LC gate entirely. The check now focuses only on FM models
|
|
||
| impacted = set() | ||
|
|
||
| def add_if_affected(obj_class, obj_list, source_label): |
There was a problem hiding this comment.
Change logic to check only MOduler spine < version, FM model
There was a problem hiding this comment.
Updated. Logic now checks only FM models
lovkeshsharma702
left a comment
There was a problem hiding this comment.
please work on all comments.
|
|
||
| Due to [CSCwt50713][67], when WRED (Weighted Random Early Detection) is enabled and specific Fabric Module (FM) hardware models are present in the fabric, the spine switch may crash after moving to an affected ACI release in the 6.1(x) or 6.2(x) range. | ||
|
|
||
| Affected versions: ACI 6.1(x) up to and including 6.1(5e), and ACI 6.2(x) up to and including 6.2(1g). |
There was a problem hiding this comment.
correct the statement . Impacted aci version 6.1(5e) and below, and 6.2(1g).
| result = PASS | ||
| headers = ["Node ID", "Node Name", "Source", "Model"] | ||
| data = [] | ||
| recommended_action = 'Disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.' |
There was a problem hiding this comment.
'Disable WRED in fabric or upgrade to release > 6.1(5e), 6.2(1g)"
|
|
||
| impacted = set() | ||
|
|
||
| # FM model gate |
There was a problem hiding this comment.
can you use copilot to align this code as per structure and styling of whole script.
| node_key = node_id | ||
| return (node_key, row[2], row[3]) | ||
|
|
||
| data = [list(row) for row in sorted(impacted, key=sort_key)] |
There was a problem hiding this comment.
No need this sort operation and use data list alone instead use impacted and data both.
Harinadh-Saladi
left a comment
There was a problem hiding this comment.
Pls address the comments. If there is any different understanding with me in the test results or technical aspects, will discuss with team and address after getting the confirmation.
|
|
||
| Affected hardware models: N9K-C9504-FM-E, N9K-C9508-FM-E, N9K-C9516-FM-E. | ||
|
|
||
| To avoid this issue, disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train. |
There was a problem hiding this comment.
Pls replace "move" with "upgrade".
| ) | ||
| doc_url = 'https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwt50713' | ||
|
|
||
| if not tversion: |
There was a problem hiding this comment.
Though the pattern is consistent across the script, that would be old code. As I cited earlier, when the tversion is not provided as an input, script will prompt to provide the input, there won't be any exception. This change needs to be incorporated across the script. Pls address it.
| @pytest.mark.parametrize( | ||
| "tversion, fabric_nodes, icurl_outputs, expected_result, expected_data", | ||
| [ | ||
| # Case 1: No target version provided (-t flag missing). |
There was a problem hiding this comment.
You can remove this case, as it's not required. Script will prompt to provide the tversion when input is not provided.
| node_id = node['fabricNode']['attributes']['id'] | ||
| node_name_map[node_id] = node['fabricNode']['attributes']['name'] | ||
|
|
||
| impacted = set() |
There was a problem hiding this comment.
Pls update variable name "impacted" with "affected_nodes".
There was a problem hiding this comment.
Pls remove this file as we're focusing only on FC models.
| # Version 6.2(1f) is in affected range, WRED is enabled, FM model N9K-C9508-FM-E is affected. | ||
| # Expected: FAIL_O with node 1001 reported under Source=FM. | ||
| ( | ||
| "6.2(1f)", |
There was a problem hiding this comment.
Pls update the version with 6.2(1g). 6.2(1f) is unavailable.
| { | ||
| eqptFC_api: read_data(dir, "eqptFC_empty.json"), | ||
| }, | ||
| script.PASS, |
There was a problem hiding this comment.
Pls update the test result as NA. Even though version is affected but the model is unaffected, since this issue is specific to the model.
| }, | ||
| script.PASS, | ||
| [], | ||
| ), |
There was a problem hiding this comment.
Pls add the test cases for mixed scenarios, if there are multiple objects with one affected model and others unaffected with wred enabled and disabled combinations.
| result = PASS | ||
| headers = ["Node ID", "Node Name", "Source", "Model"] | ||
| data = [] | ||
| recommended_action = 'Disable WRED on the affected nodes or move to a release newer than 6.1(5e) in the 6.1(x) train or newer than 6.2(1g) in the 6.2(x) train.' |
There was a problem hiding this comment.
Pls replace "move" with "upgrade"
| impacted.add((node_id, node_name_map.get(node_id, ''), 'FM', model)) | ||
|
|
||
| if not impacted: | ||
| return Result(result=PASS, msg='No affected hardware models found. Skipping.') |
There was a problem hiding this comment.
Pls check the result. I think it should be NA, as this issue is specific to the model after version check. Result will be PASS only if there is affected model and wred is disabled.
Summary
Adds a new pre-upgrade validation check to detect fabric nodes at risk due to CSCwt50713, where WRED-enabled QoS combined with specific Leaf/LC/FM hardware models can cause N9504 spine crashes after upgrading to affected ACI releases.
Detection Logic
Three gates must all be true to trigger a FAIL:
Version Gate – Target version is in the affected range:
Feature Gate – WRED is enabled (
qosCong.algo = wred)Hardware Gate – Any of the following affected models are present:
Testing
tests/checks/wred_affected_model_check/