Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytest][azp] test_chassis_system_lag_id_allocator_del_id is failing in PR commits #1687

Closed
smaheshm opened this issue Apr 2, 2021 · 2 comments · Fixed by #1692
Closed

Comments

@smaheshm
Copy link
Contributor

smaheshm commented Apr 2, 2021

This test is flaky and has been failing fairly consistently.


2021-04-02T01:34:40.6713838Z test_virtual_chassis.py:461: AssertionError
2021-04-02T01:34:40.6714224Z ________ TestVirtualChassis.test_chassis_system_lag_id_allocator_del_id ________
2021-04-02T01:34:40.6714450Z 
2021-04-02T01:34:40.6714741Z self = <test_virtual_chassis.TestVirtualChassis object at 0x7f5ebd1ce4f0>
2021-04-02T01:34:40.6715146Z vct = <conftest.DockerVirtualChassisTopology object at 0x7f5ebd0cfd60>
2021-04-02T01:34:40.6715340Z 
2021-04-02T01:34:40.6715647Z     def test_chassis_system_lag_id_allocator_del_id(self, vct):
2021-04-02T01:34:40.6716254Z         """Test lag id allocator's release id and re-use id processing.
2021-04-02T01:34:40.6716549Z     
2021-04-02T01:34:40.6716926Z         Pre-requisite:
2021-04-02T01:34:40.6717247Z             (i)  Test case: test_chassis_system_lag
2021-04-02T01:34:40.6717660Z             (ii) Test case: test_chassis_system_lag_id_allocator_table_full
2021-04-02T01:34:40.6718012Z         This test validates that
2021-04-02T01:34:40.6718373Z             (i)   Portchannel is deleted and id allocator does not return error
2021-04-02T01:34:40.6719051Z             (ii)  Should be able to add PortChannel to re-use released id
2021-04-02T01:34:40.6719667Z             (iii) Deleted portchaneels are removed from chassis app db
2021-04-02T01:34:40.6720127Z             (iv)  Remote asics remove the system lag corresponding to the deleted PortChannels
2021-04-02T01:34:40.6720508Z         """
2021-04-02T01:34:40.6720728Z     
2021-04-02T01:34:40.6720996Z         if vct is None:
2021-04-02T01:34:40.6721271Z             return
2021-04-02T01:34:40.6721516Z     
2021-04-02T01:34:40.6721784Z         test_lag1_name = "PortChannel0001"
2021-04-02T01:34:40.6722123Z         test_lag1_member = "Ethernet4"
2021-04-02T01:34:40.6722445Z         test_lag2_name = "PortChannel0002"
2021-04-02T01:34:40.6722791Z         test_lag3_name = "PortChannel0003"
2021-04-02T01:34:40.6723057Z     
2021-04-02T01:34:40.6723379Z         # Create a PortChannel in a line card 1 (owner line card)
2021-04-02T01:34:40.6723713Z         dvss = vct.dvss
2021-04-02T01:34:40.6724029Z         for name in dvss.keys():
2021-04-02T01:34:40.6724340Z             dvs = dvss[name]
2021-04-02T01:34:40.6724674Z     
2021-04-02T01:34:40.6724946Z             config_db = dvs.get_config_db()
2021-04-02T01:34:40.6725341Z             metatbl = config_db.get_entry("DEVICE_METADATA", "localhost")
2021-04-02T01:34:40.6725658Z     
2021-04-02T01:34:40.6726009Z             # Get the host name and asic name for the system lag alias verification
2021-04-02T01:34:40.6726416Z             cfg_hostname = metatbl.get("hostname")
2021-04-02T01:34:40.6726909Z             assert cfg_hostname != "", "Got error in getting hostname from CONFIG_DB DEVICE_METADATA"
2021-04-02T01:34:40.6727272Z     
2021-04-02T01:34:40.6727570Z             cfg_asic_name = metatbl.get("asic_name")
2021-04-02T01:34:40.6728013Z             assert cfg_asic_name != "", "Got error in getting asic_name from CONFIG_DB DEVICE_METADATA"
2021-04-02T01:34:40.6728391Z     
2021-04-02T01:34:40.6728680Z             cfg_switch_type = metatbl.get("switch_type")
2021-04-02T01:34:40.6728991Z     
2021-04-02T01:34:40.6729294Z             # Portchannel record verifiation done in line card
2021-04-02T01:34:40.6729694Z             if cfg_switch_type == "voq":
2021-04-02T01:34:40.6730063Z                 lc_switch_id = metatbl.get("switch_id")
2021-04-02T01:34:40.6730529Z                 assert lc_switch_id != "", "Got error in getting switch_id from CONFIG_DB DEVICE_METADATA"
2021-04-02T01:34:40.6730972Z                 if lc_switch_id == "0":
2021-04-02T01:34:40.6731273Z     
2021-04-02T01:34:40.6731628Z                     # At this point we have 2 port channels test_lag1_name and test_lag2_name.
2021-04-02T01:34:40.6732146Z                     # These were created by the above two test cases. Now delete the PortChannel
2021-04-02T01:34:40.6732648Z                     # test_lag1_name and verify that the lag is removed and add test_lag3_name to
2021-04-02T01:34:40.6733141Z                     # test for lag id allocator allocating newly available lag id
2021-04-02T01:34:40.6733491Z     
2021-04-02T01:34:40.6733818Z                     # Connect to app db: lag table and lag member table
2021-04-02T01:34:40.6734297Z                     app_db = swsscommon.DBConnector(swsscommon.APPL_DB, dvs.redis_sock, 0)
2021-04-02T01:34:40.6734856Z                     psTbl_lag = swsscommon.ProducerStateTable(app_db, "LAG_TABLE")
2021-04-02T01:34:40.6735309Z                     psTbl_lagMember = swsscommon.ProducerStateTable(app_db, "LAG_MEMBER_TABLE")
2021-04-02T01:34:40.6735633Z     
2021-04-02T01:34:40.6735950Z                     # Delete port channel member of PortChannel test_lag1_name
2021-04-02T01:34:40.6736399Z                     psTbl_lagMember.delete(f"{test_lag1_name}:{test_lag1_member}")
2021-04-02T01:34:40.6736751Z     
2021-04-02T01:34:40.6736986Z                     time.sleep(1)
2021-04-02T01:34:40.6737234Z     
2021-04-02T01:34:40.6737506Z                     # Delete PortChannel test_lag1_name
2021-04-02T01:34:40.6737864Z                     psTbl_lag.delete(f"{test_lag1_name}")
2021-04-02T01:34:40.6738191Z     
2021-04-02T01:34:40.6738442Z                     time.sleep(1)
2021-04-02T01:34:40.6738684Z     
2021-04-02T01:34:40.6738966Z                     # Verify deletion of the PorChannel
2021-04-02T01:34:40.6739296Z                     asic_db = dvs.get_asic_db()
2021-04-02T01:34:40.6739708Z                     lagkeys = asic_db.get_keys("ASIC_STATE:SAI_OBJECT_TYPE_LAG")
2021-04-02T01:34:40.6740193Z >                   assert len(lagkeys) == 1, "Two LAG entries in asic db even after deleting a PortChannel"
2021-04-02T01:34:40.6740707Z E                   AssertionError: Two LAG entries in asic db even after deleting a PortChannel
2021-04-02T01:34:40.6741092Z E                   assert 2 == 1
2021-04-02T01:34:40.6741529Z E                     -2
2021-04-02T01:34:40.6741792Z E                     +1
2021-04-02T01:34:40.6741916Z 
2021-04-02T0

@smaheshm
Copy link
Contributor Author

smaheshm commented Apr 2, 2021

@vganesan-nokia Would you be able to take a look.. Thanks!

@vganesan-nokia
Copy link
Contributor

For some reason the deleted LAG is not removed from asic db. OR this may be a timing issue. We are checking the asic db too soon after deleting the lag. I'll try to simulate this problem and confirm the reason and put a fix.

vganesan-nokia pushed a commit to vganesan-nokia/sonic-swss that referenced this issue Apr 5, 2021
Changes to fix system lag test failure issue reported by issues
sonic-net#1687.
Fixed time sleep after creating and deleting system lag is replaced by
DVS's wait_for_n_keys() functions

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
@lguohan lguohan linked a pull request Apr 9, 2021 that will close this issue
lguohan pushed a commit that referenced this issue Apr 9, 2021
Changes to fix system lag test failure issue reported by issues
#1687.
Fixed time sleep after creating and deleting system lag is replaced by
DVS's wait_for_n_keys() functions

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this issue Oct 5, 2021
Changes to fix system lag test failure issue reported by issues
sonic-net#1687.
Fixed time sleep after creating and deleting system lag is replaced by
DVS's wait_for_n_keys() functions

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants