Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for the procedures for insertion/hot swap of Switch Fabric Module(SFM) by using "config chassis modules shutdown/startup" commands #3283

Merged
merged 27 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
2b43852
* [saidump]
JunhongMao Sep 1, 2023
62aab2d
* [saidump]
JunhongMao Sep 1, 2023
5ecb3c1
Merge branch 'master' of github.com:JunhongMao/sonic-utilities
JunhongMao Sep 1, 2023
3504fdc
* [saidump]
JunhongMao Sep 5, 2023
2450c48
* [saidump]
JunhongMao Sep 6, 2023
9fa769e
Fixup based on the below PR comments.
JunhongMao Sep 19, 2023
8a3f93b
According to the testing group's advice, change the default ROUTE_TAB…
JunhongMao Sep 20, 2023
32f5607
https://github.com/sonic-net/sonic-buildimage/pull/16466 fixing based…
JunhongMao Sep 30, 2023
64ad586
Merge remote-tracking branch 'upstream/master'
JunhongMao Oct 3, 2023
867a6d1
To address the below review comments:
JunhongMao Oct 4, 2023
185bfef
Merge remote-tracking branch 'upstream/master'
JunhongMao Oct 4, 2023
86d3efa
Merge remote-tracking branch 'upstream/master'
JunhongMao Oct 12, 2023
1ca9676
Merge remote-tracking branch 'upstream/master'
JunhongMao Dec 12, 2023
1322095
[VOQ][saidump] Add saidump unit test scripts #3079
JunhongMao Dec 12, 2023
2ac9986
Merge remote-tracking branch 'upstream/master'
JunhongMao Apr 23, 2024
1bc2ab0
Revert "[VOQ][saidump] Add saidump unit test scripts #3079"
JunhongMao Apr 23, 2024
444a4eb
Update for the procedures for insertion/hot swap of Switch Fabric Mod…
JunhongMao Apr 23, 2024
003db44
fix upon review comments.
JunhongMao Apr 24, 2024
2ea3a06
fix upon review comments.
JunhongMao Apr 24, 2024
5d7719d
fix upon review comments.
JunhongMao Apr 24, 2024
6c35d0c
fix upon review comments
JunhongMao Apr 25, 2024
10b1cd1
Merge fabric_module_set_admin_status function into chassis_modules.py
JunhongMao Apr 27, 2024
fa6f17c
fix upon review comments
JunhongMao Apr 30, 2024
001e482
Add unit test codes.
JunhongMao Apr 30, 2024
76d0ada
Fix pre-commit check error and address review comments
JunhongMao May 1, 2024
aa87e55
Update hint information for "peer services"
JunhongMao May 21, 2024
118f2d7
Address a review comments
JunhongMao May 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 38 additions & 1 deletion config/chassis_modules.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#!/usr/sbin/env python

import click

import time
import utilities_common.cli as clicommon
from .fabric_module_set_admin_status import fabric_module_set_admin_status

#
# 'chassis_modules' group ('config chassis_modules ...')
Expand All @@ -17,6 +18,28 @@ def modules():
"""Configure chassis modules"""
pass

def get_config_module_state(db, chassis_module_name):
config_db = db.cfgdb
fvs = config_db.get_entry('CHASSIS_MODULE', chassis_module_name)
if not fvs:
return None
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
else:
return fvs['admin_status']

TIMEOUT_SECS = 10
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved

# Name: get_config_module_state_timeout
# return: True: timeout, False: not timeout
def get_config_module_state_timeout(ctx, db, chassis_module_name, state):
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
counter = 0
while get_config_module_state(db, chassis_module_name) != state:
time.sleep(1)
counter += 1
if counter >= TIMEOUT_SECS:
ctx.fail("get_config_module_state {} timeout".format(chassis_module_name))
return True
return False

#
# 'shutdown' subcommand ('config chassis_modules shutdown ...')
#
Expand All @@ -33,8 +56,14 @@ def shutdown_chassis_module(db, chassis_module_name):
not chassis_module_name.startswith("FABRIC-CARD"):
ctx.fail("'module_name' has to begin with 'SUPERVISOR', 'LINE-CARD' or 'FABRIC-CARD'")

#To avoid duplicate operation
if get_config_module_state(db, chassis_module_name) == 'down':
click.echo("Duplicate operation for " + chassis_module_name)
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
return
fvs = {'admin_status': 'down'}
config_db.set_entry('CHASSIS_MODULE', chassis_module_name, fvs)
if not get_config_module_state_timeout(ctx, db, chassis_module_name, 'down'):
fabric_module_set_admin_status(chassis_module_name, 'down')
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved

#
# 'startup' subcommand ('config chassis_modules startup ...')
Expand All @@ -45,5 +74,13 @@ def shutdown_chassis_module(db, chassis_module_name):
def startup_chassis_module(db, chassis_module_name):
"""Chassis-module startup of module"""
config_db = db.cfgdb
ctx = click.get_current_context()

#To avoid duplicate operation
if get_config_module_state(db, chassis_module_name) is None:
click.echo("Duplicate operation for " + chassis_module_name)
return
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved

config_db.set_entry('CHASSIS_MODULE', chassis_module_name, None)
if not get_config_module_state_timeout(ctx, db, chassis_module_name, None):
fabric_module_set_admin_status(chassis_module_name, 'up')
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
62 changes: 62 additions & 0 deletions config/fabric_module_set_admin_status.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env python3
#
import re
import subprocess
import time
from swsscommon.swsscommon import SonicV2Connector
from sonic_py_common.logger import Logger
import utilities_common.cli as clicommon

# Name: fabric_module_set_admin_status.py, version: 1.0
# Syntax: fabric_module_set_admin_status <module_name> <up/down>
def fabric_module_set_admin_status(module, state):
logger = Logger("fabric_module_set_admin_status.py")
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
logger.set_min_log_priority_info()
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved

if not module.startswith("FABRIC-CARD"):
logger.log_warning("Failed to set {} state. Admin state can only be set on Fabric module.".format(module))
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
return

if (state != "up" and state != "down"):
logger.log_warning("Failed to set {}. Admin state can only be set to up or down.".format(state))
return
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved

num = int(re.search(r"(\d+)$", module).group())
chassisdb = SonicV2Connector(host="127.0.0.1")
chassisdb.connect("CHASSIS_STATE_DB")

if state == "down":
asics_keys_list = chassisdb.keys("CHASSIS_STATE_DB", "CHASSIS_FABRIC_ASIC_TABLE*")
asic_list = []
for service in asics_keys_list:
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
name = chassisdb.get("CHASSIS_STATE_DB",service,"name")
if name == module:
asic_id = int(re.search(r"(\d+)$", service).group())
asic_list.append(asic_id)
if len(asic_list) == 0:
logger.log_warning("Failed to get {}'s asic list.".format(module))
return
logger.log_info("Shutting down chassis module {}".format(module))
for asic in asic_list:
logger.log_info("Stopping swss@{} and syncd@{} ...".format(asic, asic))
clicommon.run_command('sudo systemctl stop swss@{}.service'.format(asic))
# wait for service is down
time.sleep(5)
chassisdb.delete("CHASSIS_STATE_DB","CHASSIS_FABRIC_ASIC_TABLE|asic" + str(asic))
logger.log_info("Start swss@{} and syncd@{} ...".format(asic, asic))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is stop services --> delete database -> start services. It's based on the agreed design shown below:

1) Create a common API  
 
    fabric_module_set_admin_status(module_name,  status='up/down'):
          services_list = from CHASSIS_STATE_DB  "CHASSIS_FABRIC_ASIC_TABLE"
          for service in services_list:    <===== Example:  SFM1 :  swss0, swss1 
          systemctl stop  service
                  **** for FABRIC card "down case" =====
                      sonic-db-cli to delete   CHASSIS_STATE_DB  "CHASSIS_FABRIC_ASIC_TABLE|asic#" entry
     **** For down case,  should we bring up the swss/syncd service after the power off hardware as below ? ****
        for service in services_list:    <===== Example:  SFM1 :  swss0, swss1 
             systemctl start service

clicommon.run_command('sudo systemctl start swss@{}.service'.format(asic))
else:
# wait SFM HW init done.
time.sleep(5)
JunhongMao marked this conversation as resolved.
Show resolved Hide resolved
asics_keys_list = chassisdb.keys("CHASSIS_STATE_DB", "CHASSIS_FABRIC_ASIC_TABLE*")
asic_list = []
for service in asics_keys_list:
name = chassisdb.get("CHASSIS_STATE_DB",service,"name")
if name == module:
asic_id = int(re.search(r"(\d+)$", service).group())
asic_list.append(asic_id)
if len(asic_list) == 0:
return
for asic in asic_list:
logger.log_info("Start swss@{} and syncd@{} ...".format(asic, asic))
clicommon.run_command('sudo systemctl start swss@{}.service'.format(asic))
Loading