Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for the procedures for insertion/hot swap of Switch Fabric Module(SFM) by using "config chassis modules shutdown/startup" commands #3283

Merged
merged 27 commits into from
May 29, 2024

Conversation

JunhongMao
Copy link
Contributor

@JunhongMao JunhongMao commented Apr 23, 2024

Why I did it

For the Nokia SONiC chassis procedures for insertion/hot swap of Switch Fabric Module(SFM),
the previous solution was using the below commands.

sudo nokia_cmd set shutdown-sfm <SFM-Num/Physical-Slot>

The below 4 PRs intend to add the below commands for the equivalent operations.
sonic-net/sonic-platform-daemons#491
#3283
nokia/sonic-platform#6
sonic-net/sonic-buildimage#18938

sudo config chassis modules shutdown/startup <module name>

The HLD for Shutdown and Startup of the Fabric Module is below:
sonic-net/SONiC#1694

The below PR was replaced.
sonic-net/sonic-buildimage#18578

Work item tracking
  • Microsoft ADO (number only): 28300842

How I did it

  1. When the cli command "sudo config chassis modules startup/shutdown" runs, it directly calls config/fabric_module_set_admin_status.py to do the related operations.

How to verify it

The below test was carried out on FABRIC-CARD3 module on the supervisor card.
1. Shutdown
sudo config chassis modules shutdown FABRIC-CARD3

2. Check the status to see if the FABRIC-CARD3 was down.
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3             Unavailable                4          Empty            down          N/A

sudo tail -f /var/log/syslog | grep "pmon#chassisd:"
May  1 00:07:54.192037 ixre-cpm-chassis15 WARNING pmon#chassisd: Module FABRIC-CARD3 went off-line!
 ...

 
3. Start up the module
sudo config chassis modules startup FABRIC-CARD3


4. Check the status
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3                    SFM4                4         Online              up  01214400362

sudo tail -f /var/log/syslog | grep "pmon#chassisd:"
May  1 00:26:29.501687 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module FABRIC-CARD3 recovered on-line!


5. To test if the operation is still valid when the system reboot. For example, first shut down, 
then after saving config and reboot, the module should keep shutdown status. 
$ sudo config save
Existing files will be overwritten, continue? [y/N]: y

Then check the status to see if the FABRIC-CARD3 was down.
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3             Unavailable                4          Empty            down          N/A


Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • 202205

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  sonic-net#1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  sonic-net#2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  sonic-net#3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  sonic-net#4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  sonic-net#5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
    •       Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

    Solution and modification:
    To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

    (1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
    (2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
    (3) Add a new script file: files/scripts/saidump.sh, to do the below steps
      For each ASIC0, such as ASIC0,

      1. Save the Redis data.
      sudo sonic-db-cli -n asic$1 SAVE > /dev/null

      2. Move dump files to /var/run/redisX/
      docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

      3. Run rdb command to convert the dump files into JSON files
      sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

      4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
      docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

      5. clear
      sudo rm -f /var/run/redis$1/dump.rdb
      sudo rm -f /var/run/redis$1/dump.json

    (4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
        •       Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

        Solution and modification:
        To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

        (1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
        (2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
        (3) Add a new script file: files/scripts/saidump.sh, to do the below steps
          For each ASIC0, such as ASIC0,

          1. Save the Redis data.
          sudo sonic-db-cli -n asic$1 SAVE > /dev/null

          2. Move dump files to /var/run/redisX/
          docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

          3. Run rdb command to convert the dump files into JSON files
          sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

          4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
          docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

          5. clear
          sudo rm -f /var/run/redis$1/dump.rdb
          sudo rm -f /var/run/redis$1/dump.json

        (4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
sonic-net#2972
SAI DUMP based on the route table size

* [saidump]
• Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, to check the asic db size and if it is larger than xxx entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
To add the rdbtools into base docker.
Move saidump.sh from host to syncd docker container.
sonic-net#2972 added two below functions into scripts/generate_dump.
get_route_table_size_by_asic_id_and_ipver
save_saidump_by_route_size
The unittest scripts need to be added.

Related PRs:
sonic-net#2972
sonic-net/sonic-buildimage#16466
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

Microsoft ADO (25892277):

Add two scripts:
tests/saidump_test.py
tests/saidump_test.sh

To use below 6 test cases to verify the functionality of get_route_table_size_by_asic_id_and_ipver and save_saidump_by_route_size behave correctly.

```
saidump test list format: [ACIS number, ipv4 and ipv6 route table size, expected function save_cmd arguments]
saidump_test_list = [
    [1, 10000, "docker exec syncd saidump saidump"],
    [1, 12000, "docker exec syncd saidump saidump"],
    [1, 12001, "docker exec syncd saidump.sh saidump"],
    [1, 20000, "docker exec syncd saidump.sh saidump"],
    [2, 10000, "docker exec syncd0 saidump saidump0\ndocker exec syncd1 saidump saidump1"],
    [2, 12000, "docker exec syncd0 saidump saidump0\ndocker exec syncd1 saidump saidump1"],
    [2, 12001, "docker exec syncd0 saidump.sh saidump0\ndocker exec syncd1 saidump.sh saidump1"],
    [2, 20000, "docker exec syncd0 saidump.sh saidump0\ndocker exec syncd1 saidump.sh saidump1"]
]
```
During the compiling stage, run the below command to check if it's PASSED.
jumao@1b1ffba5949a:/sonic/src/sonic-utilities$ time python3 setup.py test
tests/saidump_test.py::test_saidump PASSED
…ule(SFM) by using "config chassis modules shutdown/startup" commands
Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add UTs

config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Show resolved Hide resolved
config/chassis_modules.py Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Show resolved Hide resolved
config/chassis_modules.py Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
tests/chassis_modules_test.py Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/fabric_module_set_admin_status.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
config/chassis_modules.py Show resolved Hide resolved
config/chassis_modules.py Outdated Show resolved Hide resolved
@JunhongMao JunhongMao force-pushed the 202205 branch 2 times, most recently from ba93f2e to 1fce452 Compare May 13, 2024 18:24
@JunhongMao
Copy link
Contributor Author

@arlakshm @judyjoseph @anamehra, Please review again and approve the merge, thanks.

@rlhui rlhui added the p0 label May 15, 2024
@kenneth-arista
Copy link
Contributor

@jfeng-arista for awareness

@JunhongMao
Copy link
Contributor Author

@arlakshm @judyjoseph @anamehra, please review again and approve the merge, thanks.

judyjoseph
judyjoseph previously approved these changes May 21, 2024
Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JunhongMao
Copy link
Contributor Author

@kenneth-arista, please review and approve it, thanks.

Copy link
Contributor

@kenneth-arista kenneth-arista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

config/chassis_modules.py Outdated Show resolved Hide resolved
@judyjoseph judyjoseph merged commit be4689d into sonic-net:master May 29, 2024
7 checks passed
@gechiang gechiang added the included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" label Jun 14, 2024
arfeigin pushed a commit to arfeigin/sonic-utilities that referenced this pull request Jun 16, 2024
…ule(SFM) by using "config chassis modules shutdown/startup" commands (sonic-net#3283)

sudo config chassis modules shutdown/startup <module name>

The HLD for Shutdown and Startup of the Fabric Module is below:
sonic-net/SONiC#1694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis for 202205 Branch included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" p0
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

10 participants