Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for the procedures for insertion/hot swap of Switch Fabric Module(SFM) by using "config chassis modules shutdown/startup" commands #475

Closed
wants to merge 2 commits into from

Conversation

JunhongMao
Copy link
Contributor

@JunhongMao JunhongMao commented Apr 23, 2024

Why I did it

For the Nokia SONiC chassis procedures for insertion/hot swap of Switch Fabric Module(SFM),
the previous solution was using the below commands.

sudo nokia_cmd set shutdown-sfm <SFM-Num/Physical-Slot>

The below 4 PRs intend to add the below commands for the equivalent operations.
#475
sonic-net/sonic-utilities#3283
nokia/sonic-platform#6
sonic-net/sonic-buildimage#18938

sudo config chassis modules shutdown/startup <module name>

The HLD for Shutdown and Startup of the Fabric Module is below:
sonic-net/SONiC#1694

The below PR was replaced.
sonic-net/sonic-buildimage#18578

Work item tracking
  • Microsoft ADO (number only):

How I did it

  1. When the cli command "sudo config chassis modules startup/shutdown" runs, it directly calls config/fabric_module_set_admin_status.py to do the related operations.

How to verify it

The below test was carried out on FABRIC-CARD3 module on the supervisor card.
1. Shutdown
sudo config chassis modules shutdown FABRIC-CARD3

2. Check the status to see if the FABRIC-CARD3 was down.
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3             Unavailable                4          Empty            down          N/A

sudo tail -f /var/log/syslog | grep "pmon#chassisd:"
May  1 00:07:54.192037 ixre-cpm-chassis15 WARNING pmon#chassisd: Module FABRIC-CARD3 went off-line!
 ...

 
3. Start up the module
sudo config chassis modules startup FABRIC-CARD3


4. Check the status
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3                    SFM4                4         Online              up  01214400362

sudo tail -f /var/log/syslog | grep "pmon#chassisd:"
May  1 00:26:29.501687 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module FABRIC-CARD3 recovered on-line!


5. To test if the operation is still valid when the system reboot. For example, first shut down, 
then after saving config and reboot, the module should keep shutdown status. 
$ sudo config save
Existing files will be overwritten, continue? [y/N]: y

Then check the status to see if the FABRIC-CARD3 was down.
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3             Unavailable                4          Empty            down          N/A


Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • 202205

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

…ule(SFM) by using "config chassis modules shutdown/startup" commands
sonic-chassisd/scripts/chassisd Outdated Show resolved Hide resolved
sonic-chassisd/scripts/chassisd Outdated Show resolved Hide resolved
@abdosi
Copy link
Contributor

abdosi commented Apr 30, 2024

can we add Design doc as discussed in the Chassis Community meeting ?

@abdosi
Copy link
Contributor

abdosi commented Apr 30, 2024

@bmridul for review.

Copy link
Contributor

@mlok-nokia mlok-nokia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me

@mlok-nokia
Copy link
Contributor

@arlakshm @judyjoseph Please help to review this PR, tahnsk

@bmridul
Copy link
Collaborator

bmridul commented May 7, 2024

In Unit test, the operational status of Empty does not seem correct. The operational status could be "Offline" or some other word. However if the card has not been taken out of the chassis then it should not be Empty. Empty should indicate if the card has been physically removed from the chassis.

$ show chassis modules status
Name Description Physical-Slot Oper-Status Admin-Status Serial


...
FABRIC-CARD3 Unavailable 4 Empty down N/A

@JunhongMao
Copy link
Contributor Author

@mlok-nokia, please help to comment on this.

In Unit test, the operational status of Empty does not seem correct. The operational status could be "Offline" or some other word. However if the card has not been taken out of the chassis then it should not be Empty. Empty should indicate if the card has been physically removed from the chassis.

$ show chassis modules status Name Description Physical-Slot Oper-Status Admin-Status Serial

... FABRIC-CARD3 Unavailable 4 Empty down N/A

@mlok-nokia
Copy link
Contributor

In Unit test, the operational status of Empty does not seem correct. The operational status could be "Offline" or some other word. However if the card has not been taken out of the chassis then it should not be Empty. Empty should indicate if the card has been physically removed from the chassis.

$ show chassis modules status Name Description Physical-Slot Oper-Status Admin-Status Serial

... FABRIC-CARD3 Unavailable 4 Empty down N/A

On nokia platform, When a SFM is power off, it cannot be detected even it is inserted in the chassis. Therefore, it is treated as empty instead of offline.

@mlok-nokia
Copy link
Contributor

can we add Design doc as discussed in the Chassis Community meeting ?
HLD link:
sonic-net/SONiC#1694

@JunhongMao
Copy link
Contributor Author

@judyjoseph , please review it again and approve it for merging. Thanks.

judyjoseph
judyjoseph previously approved these changes May 21, 2024
Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JunhongMao
Copy link
Contributor Author

This PR has been replaced by #491.
Because we should raise the PR first in master -- get it merged and later cherry-pick to 202205

@judyjoseph
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mlok-nokia
Copy link
Contributor

@JunhongMao Please close this PR since it is no longer valid.

@JunhongMao JunhongMao closed this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants