Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fastboot] Delay PMON service for better fastboot performance #10567

Merged
merged 6 commits into from
May 2, 2022
Merged

[Fastboot] Delay PMON service for better fastboot performance #10567

merged 6 commits into from
May 2, 2022

Conversation

shlomibitton
Copy link
Contributor

@shlomibitton shlomibitton commented Apr 13, 2022

Signed-off-by: Shlomi Bitton shlomibi@nvidia.com

Why I did it

Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

bootchart_no_optimizations

How I did it

  • Add a timer for PMON service.
  • Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
  • Copy the timer file to the host bin image.

How to verify it

Run fast-reboot on MLNX platform and observe faster create_switch execution time.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
@liat-grozovik liat-grozovik added the Request for 202111 Branch For PRs being requested for 202111 branch label Apr 13, 2022
@liat-grozovik liat-grozovik requested review from stepanblyschak and vaibhavhd and removed request for lguohan April 13, 2022 15:47
Comment on lines 49 to 55
BOOT_TYPE=`getBootType`
if [[ x"$BOOT_TYPE" == x"fast" ]]; then
return
fi
debug "Starting pmon service..."
/bin/systemctl start pmon
debug "Started pmon service"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added by #3505 to fix some issues related to pmon/SDK. Won't that issue happen again with your change?

Copy link
Contributor Author

@shlomibitton shlomibitton Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, notice my change is only on startup flow and not shutdown flow.
Delaying PMON is OK as long SYNCD is running before starting it.
For shutdown flow the behavior is the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can be made generic to all platforms. CC @yxieca.

Also, if this is not made generic, then the comments on files/build_templates/pmon.timer should be modified to say:
# This delay is for fast-reboot performance on MLNX platform

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't you want to do this change for warm-reboot as well? I am curious why only for fast-reboot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a generic change, this exception is only for MLNX platform, so I added a condition for it in order to keep the delay and not start PMON with SYNCD start flow.
Regarding warm-boot, I totally agree we can do it for both types, fixed it.

@@ -46,6 +46,10 @@ function startplatform() {
function waitplatform() {

if [[ x"$sonic_asic_platform" == x"mellanox" ]]; then
BOOT_TYPE=`getBootType`
if [[ x"$BOOT_TYPE" == x"fast" ]]; then
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be meaningful to have a log line here mentioning that for fastboot (or warm), pmon start will be delayed as governed by files/build_templates/pmon.timer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log added.

@@ -0,0 +1,12 @@
[Unit]
# This delay is for fast-reboot performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction - fast-reboot and warm-reboot performance.

debug "Starting pmon service..."
/bin/systemctl start pmon
debug "Started pmon service"
if [[ x"$BOOT_TYPE" != x"cold" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as on lldp PR:
Minor: This is correct, but it may be better to match against whitelist BOOT_TYPEs (fast, warm, fastfast).
This is so that this condition does not incorrectly evaluate to true for any other boottype that gets added in future.

vaibhavhd
vaibhavhd previously approved these changes Apr 18, 2022
Copy link
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. LGTM otherwise.

@shlomibitton
Copy link
Contributor Author

@vaibhavhd @stepanblyschak I fixed your comments, can you please review and approve?
Thanks

@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

stepanblyschak
stepanblyschak previously approved these changes Apr 22, 2022
@shlomibitton
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik
Copy link
Collaborator

@vaibhavhd kindly reminder to review

vaibhavhd
vaibhavhd previously approved these changes Apr 28, 2022
@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

liat-grozovik pushed a commit that referenced this pull request Apr 28, 2022
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
@shlomibitton shlomibitton dismissed stale reviews from vaibhavhd and stepanblyschak via 0b84c45 May 1, 2022 10:57
judyjoseph pushed a commit that referenced this pull request May 2, 2022
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
@liat-grozovik liat-grozovik merged commit 4ec3af8 into sonic-net:master May 2, 2022
@liat-grozovik
Copy link
Collaborator

As prev approved by @vaibhavhd and merge only done to avoid checkers failures, PR is now merged.

@qiluo-msft
Copy link
Collaborator

This commit could not be cleanly cherry-picked to 202012. Please submit another PR.

@shlomibitton
Copy link
Contributor Author

@qiluo-msft I created a separate PR for 202012:
#10745

judyjoseph pushed a commit that referenced this pull request May 8, 2022
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
@shlomibitton shlomibitton deleted the shlomi_delay_pmon_service branch May 26, 2022 12:25
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Jun 20, 2022
…net#10568)

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: sonic-net#10567
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Jun 20, 2022
…net#10567)

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Included in 202111 Branch Request for 202111 Branch For PRs being requested for 202111 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants