Skip to content

Commit

Permalink
[Nokia][chassis] modify Nokia-IXR7250E-36x400G platform specified reb…
Browse files Browse the repository at this point in the history
…oot to allow SUP to log expected/unepected midplane/module connectivity msg (sonic-net#18805)

Why I did it
For Linecard expected and unexpected reboot, Supervisor needs to log a expected and unexpected lost connectivity message. After the new mechanism has been introduced by PRs. For Nokia-IXR7250E-36x600G linecard, it requires to handle missing heartbeat reboot is unexpected reboot for SUP. Issue sonic-net#18540

Work item tracking
Microsoft ADO (number only):
How I did it
On Nokia-IXR7250E-36x400G platform, missing heartbeat reboot also call the "sudo reboot" which creates a CHASSIS_MODULE_REBOOT_INFO_TABLE entry expected reboot on SUP. Since heartbeat reboot is unexpected reboot, it requires to modify the platform_reboot check if it is missing heart reboot, then remove the CHASSIS_MODULE_REBOOT_INFO_TABLE entry on the SUP. So that, SUP can log the unexpected log.

How to verify it
Simulated the missing heartbeat reboot on the linecard, then, verify the log message on SUP as below
Apr 25 19:50:19.286081 ixre-cpm-chassis7 WARNING pmon#chassisd: Module LINE-CARD0 went off-line!
Apr 25 19:50:22.549416 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 lost midplane connectivity.


Signed-off-by: mlok <marty.lok@nokia.com>
  • Loading branch information
mlok-nokia authored and gechiang committed May 22, 2024
1 parent b976280 commit 6d31a48
Showing 1 changed file with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions device/nokia/x86_64-nokia_ixr7250e_36x400g-r0/platform_reboot
Original file line number Diff line number Diff line change
@@ -1,12 +1,27 @@
#!/bin/bash

DEVICE_MGR_REBOOT_FILE=/tmp/device_mgr_reboot
REBOOT_CAUSE_FILE=/host/reboot-cause/reboot-cause.txt
DEVICE_REBOOT_CAUSE_FILE=/etc/opt/srlinux/reboot-cause.txt
kHeartbeatLostRebootCause="Heartbeat with the Supervisor card lost"
DEVICE_DETAILS_FILE="/etc/opt/srlinux/devices/hw_details.json"

ungraceful_reboot_handle()
{
str=$(grep "$kHeartbeatLostRebootCause" $DEVICE_REBOOT_CAUSE_FILE 2> /dev/null)
status=$?
if [ $status -eq 0 ]; then
slot_num=$(jq -r '.slot_num' $DEVICE_DETAILS_FILE 2>/dev/null)
slot_num=$((slot_num - 1))
sonic-db-cli CHASSIS_STATE_DB del "CHASSIS_MODULE_REBOOT_INFO_TABLE|LINE-CARD${slot_num}"
fi
}
update_reboot_cause()
{
DEVICE_MGR_REBOOT_FILE=/tmp/device_mgr_reboot
REBOOT_CAUSE_FILE=/host/reboot-cause/reboot-cause.txt
DEVICE_REBOOT_CAUSE_FILE=/etc/opt/srlinux/reboot-cause.txt
if [ -e $DEVICE_MGR_REBOOT_FILE ]; then
if [ -e $DEVICE_REBOOT_CAUSE_FILE ]; then
# reomve the REBOOT_INFO_TABLE entry for unpexected reboot
ungraceful_reboot_handle
cp -f $DEVICE_REBOOT_CAUSE_FILE $REBOOT_CAUSE_FILE
fi
rm -f $DEVICE_MGR_REBOOT_FILE
Expand All @@ -18,7 +33,7 @@ update_reboot_cause()
}

echo "Disable all SFPs"
python3 -c 'import sonic_platform.platform; platform_chassis = sonic_platform.platform.Platform().get_chassis(); platform_chassis.tx_disable_all_sfps()'
python3 -c 'import sonic_platform.platform; platform_chassis = sonic_platform.platform.Platform().get_chassis(); platform_chassis.tx_disable_all_sfps()' &
sleep 3

# update the reboot_cuase file when reboot is trigger by device-mgr
Expand Down

0 comments on commit 6d31a48

Please sign in to comment.