Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E1031] add platform specific reboot command support #15889

Merged

Conversation

qnos
Copy link
Contributor

@qnos qnos commented Jul 18, 2023

ADO: 24651987

Why I did it

E1031(HLX): add platform specific cold reboot support, using CPLD board level power cycle functionality for cold reboot.

How I did it

Use the CPLD to trigger board level power cycle when cold reboot

How to verify it

Do reboot stress test and check the reboot cause history

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Why I did it
E1031: add platform specific cold reboot support

How I did it
Use the CPLD to trigger board level power cycle when cold reboot

How to verify it
Do reboot stress test and check the reboot cause history
@qnos
Copy link
Contributor Author

qnos commented Jul 18, 2023

Attach CPLD version info and stress test log.

admin@sonic:~$ show platform firmware status
Chassis                Module    Component    Version    Description
---------------------  --------  -----------  ---------  ----------------------------
Celestica-E1031-T48S4  N/A       SMC_CPLD     0.5        System Management Controller
                                 MMC_CPLD     0.9        Module Management CPLD
                                 BIOS         5.6.5      Basic Input/Output System
admin@sonic:~$ 

hlx-cold-reboot-test.log

@Blueve Blueve requested a review from prgeor July 18, 2023 07:21
CPLD_SETREG_PATH="/sys/bus/platform/devices/e1031.smc/setreg"

# Board level power cycle
echo "0x0113 0xAA" > ${CPLD_SETREG_PATH}
Copy link
Contributor

@prgeor prgeor Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos is the filesystem unmounted at this stage? if not, it can cause FS corruption

Copy link
Contributor Author

@qnos qnos Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos is the filesystem unmounted at this stage? if not, it can cause FS corruption

Understand your concern, I will add fs umount logic before power cycle. Thanks.

@xumia
Copy link
Collaborator

xumia commented Jul 18, 2023

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qnos qnos requested a review from prgeor July 19, 2023 02:28
@jarias-lfx
Copy link

/easycla

@Blueve
Copy link
Contributor

Blueve commented Jul 26, 2023

ADO: 24651987


echo "User issued 'reboot' with platform-specific command [User: ${REBOOT_USER}, Time: ${REBOOT_TIME}]" > ${REBOOT_CAUSE_FILE}
sync
/sbin/fstrim -av
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos why do we need trim at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos why do we need trim at this point?

Cold reboot is a kind of force power cycle, add fstrim to try reducing the disk segment before force reboot. Commonly in non-test environment, cold reboot wouldn't be called frequently, so it wouldn't do harm to SSD health and lifecycle. Even in test environment, hundreds of reboot test would introduce obvious effect to SSD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos reducing the disk segment means? how cold reboot affects SSD health/life?

Copy link
Contributor Author

@qnos qnos Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos reducing the disk segment means? how cold reboot affects SSD health/life?

fstrim could be used for disk defragmentation, i meant using fstrim(called by cold reboot script) too frequently would affect SSD health. But in reboot case, that wouldn't be frequent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnos I think Prince's questions is on why we need do this fstrim here? Since we already have cronjob to do daily fstrim, it seems like unnecessary to do fstrim before power-cycle. Do you have any other concern to have this operation here? If not, I think we'd better remove it since it is not related to the power-cycle.

@prgeor
Copy link
Contributor

prgeor commented Aug 2, 2023

@yxieca could you merge to master?

@prgeor
Copy link
Contributor

prgeor commented Aug 2, 2023

@qiluo-msft please cherry pick to 202012. ADO# 24651987

@yxieca yxieca merged commit 9a7eb49 into sonic-net:master Aug 3, 2023
17 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202012: #16216

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 20, 2023
* [E1031] add platform specific reboot command support

Why I did it
E1031: add platform specific cold reboot support

How I did it
Use the CPLD to trigger board level power cycle when cold reboot

How to verify it
Do reboot stress test and check the reboot cause history

* [E1031] try to umount filesystem before power cycle reboot

* [E1031] remove fstrim in customized reboot script
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
* [E1031] add platform specific reboot command support

Why I did it
E1031: add platform specific cold reboot support

How I did it
Use the CPLD to trigger board level power cycle when cold reboot

How to verify it
Do reboot stress test and check the reboot cause history

* [E1031] try to umount filesystem before power cycle reboot

* [E1031] remove fstrim in customized reboot script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants