-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO errors seen during warm reboot causing drop in traffic from servers #6240
Comments
Looking more into the logs:
|
Side question - for all the traffic that is dropped, both |
This issue is seen only after DUT is warm rebooted and traffic from servers to T1 is dropped. Not seen on KVM or cold-rebooted DUT. It is also hard to nail when the issue started occurring first, as the older versions of BRCM SAI in master had syncd crashes. Looking at the syslog, there are SEVERAL errors seen during syncd warm reboot. Specifically the errors regarding VID to RID translation and why do we skip warm reboot for some attributes like
|
Now this is being tested and seen on latest SAI version and master image:
|
With every traffic drop,
Output of my station tcam is slightly different after COLD reboot and WARM reboot. Looping in BRCM team to look further into the issue. |
Thanks @gechiang for the debug help and creating a BRCM SAI CSP for this - CS00011676762. |
Fix here: sonic-net/sonic-sairedis#761, for issue with bulk set operation, when traffic was blackholled, next hop for default route was set to 0x0, since overridden by last attribute which was a bug |
This issue is fixed by sonic-net/sonic-sairedis#761 |
Description
I/O tests during warm-reboot are failing due to server->t1 traffic being dropped.
Steps to reproduce the issue:
test_advanced_reboot::test_warm_reboot
on the latest Master image with SAI fix for Syncd APPLY_VIEW failure causes Orchagent crash after warm reboot #6069In syslog, continous
SAI_API_FDB:_brcm_sai_fdb_event_cb
events are seen for a DUT port that is selected for the testing.This is tested on DX010 T0 and issue is almost consistent across warm-reboots as long as traffic is running.
Additionally, this issue is not seen on a KVM device
Describe the results you received:
This is repetitive in the logs. For example the below grep counts:
The test itself fails due to no packets received from servers. Below is a snippet from IO test running on PTF:
The interface counters show drops seen on a DUT port:
Describe the results you expected:
Warm reboot test should pass with I/O passing for both traffic from servers and T1 devices.
Additional information you deem important (e.g. issue happens only occasionally):
The test used a patch for #6069 to unblock OA crash issue.
The text was updated successfully, but these errors were encountered: