Express Reboot_HLD #1570

jhli-cisco · 2023-12-22T18:09:38Z

No description provided.

zhangyanzhao · 2024-01-17T08:31:02Z

Is this proposal Cisco 8000 specific? Or can apply to other platforms?

jhli-cisco · 2024-01-17T16:36:18Z

Hi Yanzhao, It is restricted as cisco 8000 platform specific as of now, as that is where the SDK support has been implemented.

paulmenzel · 2024-01-30T07:17:20Z

(Please remove the citation part, if you reply via email.)

Have you already implemented this as a prototype, and can you share the timings?

zhangyanzhao · 2024-01-30T16:12:55Z

Community reviewed on 1/30/2024.

venkatmahalingam · 2024-01-30T16:13:53Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+3.	During express boot pre-shutdown, syncd sets SAI_SWITCH_ATTR_FAST_API_ENABLE = true in setPreShutdownOnAllSwitches() to notify SDK for proper cleanup.
+
+```
+sai_status_t yncd::setPreShutdownOnAllSwitches()


Typo. Syncd

Thanks!. will correct it.

zhangyanzhao · 2024-01-30T16:14:04Z

We should define a clear reboot time threshold for express reboot, also please keep in mind that this feature may impact other platforms and the threshold should apply to other platforms.

rck-innovium · 2024-01-30T16:36:10Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+         * A special type of boot used by Cisco platforms to start in 'express'
+         * boot mode
+         */
+        SAI_START_TYPE_EXPRESS_BOOT = 4,


We need to describe what the SAI implementation needs to do for Express Boot?

How is Express Boot different from FastFast boot?

Will update SAI to add express boot type. The difference is that express boot needs its own pre-shutdown notification to differentiate from warm boot, but fastfast boot uses the same pre-shutdown notification as warm boot.

rck-innovium · 2024-01-30T16:36:10Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+         * A special type of boot used by Cisco platforms to start in 'express'
+         * boot mode
+         */
+        SAI_START_TYPE_EXPRESS_BOOT = 4,


We need to describe what the SAI implementation needs to do for Express Boot?

How is Express Boot different from FastFast boot?

rck-innovium · 2024-01-30T16:36:54Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+         */
+        SAI_START_TYPE_EXPRESS_BOOT = 4,
+
+        /**


Please update this here:

https://github.com/opencomputeproject/SAI/blob/c7a49cce5646bd6793e36b4cc3418dac4ae81ac9/inc/saiswitch.h#L3226

Will do. Thanks!

rck-innovium · 2024-01-30T16:36:54Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+         */
+        SAI_START_TYPE_EXPRESS_BOOT = 4,
+
+        /**


Please update this here:

https://github.com/opencomputeproject/SAI/blob/c7a49cce5646bd6793e36b4cc3418dac4ae81ac9/inc/saiswitch.h#L3226

rck-innovium · 2024-01-30T16:43:39Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+```
+sai_status_t yncd::setPreShutdownOnAllSwitches()
+{
+	    if (shutdownType == SYNCD_RESTART_TYPE_EXPRESS) 


If I understand correctly, this step is the only difference between Express boot vs FastFast boot.

Why is this step needed? During Express shut, the SAI implementation must not reset the ASIC.

This can be achieved by setting SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL to false.

It needs a pre-shutdown notification so that platform hardware knows it and does necessary things pre- and during shutdown. SAI_SWITCH_ATTR_FAST_API_ENABLE is used after boot upon receving SAI_REDIS_NOTIFY_SYNCD_APPLY_VIEW. SO we used the same attr for pre-shutdown notification as well.

We are overloading SAI_SWITCH_ATTR_FAST_API_ENABLE attribute. We should clearly define what the underlying SAI implementation must do for Express boot/shut.

SAI_SWITCH_ATTR_FAST_API_ENABLE Express Boot/shut Meaning
True Boot
Valid only for FastFast and Express??
Signals the SAI adapter to begin the bulk programming

False Boot Should not be used?
True Shut ??
False Shut ??

Hi, this is how the SAI_SWITCH_ATTR_FAST_API_ENABLE being used for express/fastfast boot. Will add this info to the doc.

SAI_SWITCH_ATTR_FAST_API_ENABLE Express boot fastfast boot Meaning

TRUE Shut Not used Pre-shutdown notification

FALSE Boot Boot Signals the SAI adapter to begin the bulk programming post reboot

rck-innovium · 2024-01-30T16:43:39Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+```
+sai_status_t yncd::setPreShutdownOnAllSwitches()
+{
+	    if (shutdownType == SYNCD_RESTART_TYPE_EXPRESS) 


If I understand correctly, this step is the only difference between Express boot vs FastFast boot.

Why is this step needed? During Express shut, the SAI implementation must not reset the ASIC.

This can be achieved by setting SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL to false.

zhangyanzhao · 2024-01-30T16:59:29Z

community review recording https://zoom.us/rec/share/Vp1yF8lprkAryYApvE421KZdeE7M62dB6q0QT9NstwKasFA3jDMMqhBi9nudKaVk.ZTQJQ2IVqoIbl_Cx

jhli-cisco · 2024-02-07T16:55:30Z

(Please remove the citation part, if you reply via email.)

Have you already implemented this as a prototype, and can you share the timings?

It is sub-second traffic interruption.

eddieruan-alibaba · 2024-02-09T22:36:22Z

Can you explain how you could get "SDK needs a notification once all configurations are restored after express boot."? Would it be configuration only, or it would include BGP routes redownloading?

eddieruan-alibaba · 2024-02-12T21:17:28Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+
+## Overview
+
+The goal of Sonic express reboot is to be able to restart and upgrade SONiC software with sub-second data plane interruption. This is achieved by not initializing external phys, and only initializing NPU after receiving all configuration updates. This document covers updates needed in Sonic and SAI interface. 


Can you create a section to describe the timelines on control plane and data plane version changes during this process, how the control traffic will be handled especially when control plane and data plane are not in the same version of images?

"all configuration updates", do you include operation data such as BGP routes learnt from remote side to be updated as well?

@eddieruan-alibaba add a drawing that illustrates the interactions between v1/v2 sonic/sdk/npu during image upgrade. Please let me know if it is what you were looking for.

yxieca · 2024-02-12T23:22:14Z

doc/express-reboot/Cisco_8000_Express_Reboot_HLD.md

+The existing fast-reboot script ((https://github.com/sonic-net/sonic-utilities/blob/master/scripts/fast-reboot) that is used to trigger fast-reboot/warm-boot will be enhanced to support express-reboot. Major changes are:
+
+1.	Enforce express-reboot is only applicable to Cisco 8000 platforms at the moment.
+2.	During express-reboot shutdown path, syncd_request_shutdown will be called with a new option “-pxe” to inform syncd pre-shutdown for express boot mode. The actual calls will be “docker exec syncd /usr/bin/syncd_request_shutdown –pxe”.


What does -pxe stand for? Why not use -exp as it is the prefix of express reboot?

HI Ying, it stands for pre-expressboot. This is the notification before reboot.

jhli-cisco · 2024-02-15T16:42:55Z

Can you explain how you could get "SDK needs a notification once all configurations are restored after express boot."? Would it be configuration only, or it would include BGP routes redownloading?

Hi Eddie, this is the same as in warm-reboot case, not only configuration, I believe including BGP routes.

jhli-cisco · 2024-03-26T04:50:25Z

@yxieca @eddieruan-alibaba @rck-innovium Please let me know if you have any more comments.

eddieruan-alibaba · 2024-03-26T06:43:56Z

When do you enable the punt path from NPU to SONiC? between t1 and t2, or between t3 and t4?
How does routes and neighbor / ARP be reinstalled in V2's kernel?

jhli-cisco · 2024-03-28T16:24:29Z

When do you enable the punt path from NPU to SONiC? between t1 and t2, or between t3 and t4? How does routes and neighbor / ARP be reinstalled in V2's kernel?

@eddieruan-alibaba the punt path is enabled for both periods. Between t1 and t2 the npu is v1 version and between t3 and t4 is in v2 version. Routes/ARP are replayed by v2 swss between t1 and t2 from saved db.

eddieruan-alibaba · 2024-03-28T18:30:09Z

When do you enable the punt path from NPU to SONiC? between t1 and t2, or between t3 and t4? How does routes and neighbor / ARP be reinstalled in V2's kernel?

@eddieruan-alibaba the punt path is enabled for both periods. Between t1 and t2 the npu is v1 version and between t3 and t4 is in v2 version. Routes/ARP are replayed by v2 swss between t1 and t2 from saved db.

Can you add some detail information about this punt path handling in your HLD? Between t1 and t2, your v1 hardware will send traffic to v2 software. Since you want to be able upgrade from any version to any version, this version mismatch punt would be a very strong requirement on punt header backward compatibility.

Also, can you add some detail information on how to replay Routes / ARP between t1 and t2, and what benefits you want to achieve with replay. From when BGP would relearn the routes, and how BGP learnt routes get resync with your replay routes. For ARP entries, how do you tell kernel to install your replay entry and how kernel should handle it with learnt ARP packets between t1 and t2. I don't have a clear picture in this area.

jhli-cisco · 2024-04-01T21:23:24Z

When do you enable the punt path from NPU to SONiC? between t1 and t2, or between t3 and t4? How does routes and neighbor / ARP be reinstalled in V2's kernel?

@eddieruan-alibaba the punt path is enabled for both periods. Between t1 and t2 the npu is v1 version and between t3 and t4 is in v2 version. Routes/ARP are replayed by v2 swss between t1 and t2 from saved db.

Can you add some detail information about this punt path handling in your HLD? Between t1 and t2, your v1 hardware will send traffic to v2 software. Since you want to be able upgrade from any version to any version, this version mismatch punt would be a very strong requirement on punt header backward compatibility.

Also, can you add some detail information on how to replay Routes / ARP between t1 and t2, and what benefits you want to achieve with replay. From when BGP would relearn the routes, and how BGP learnt routes get resync with your replay routes. For ARP entries, how do you tell kernel to install your replay entry and how kernel should handle it with learnt ARP packets between t1 and t2. I don't have a clear picture in this area.

Hi @eddieruan-alibaba, there isn't special handling in punt path. Express boot isn't targeted for upgrade from any version to any version, and in reality, it can't. The upgrade in control plane/sonic still follows warm boot requirements.

Route/ARP replay after reboot also follows warm-reboot logic which is documented in https://github.com/sonic-net/SONiC/blob/master/doc/warm-reboot/swss_warm_restart.md. The only difference from warm-reboot is in syncd the reconciliation is skiped for express boot.

Hope this clarrify things a bit.

eddieruan-alibaba · 2024-04-01T22:21:11Z

Express boot isn't targeted for upgrade from any version to any version, and in reality, it can't. The upgrade in control plane/sonic still follows warm boot requirements.

Can you create a table to compare your approach with existing approach from boot requirements to implementations. Since you don't hold "any version to any version upgrade", I don't have a clear picture on what your approach is different from existing one. It would be great if you could articulate it clearly.

jhli-cisco · 2024-04-02T18:27:10Z

Hi @eddieruan-alibaba, added a table to compare major steps in warm and express boot. Please let me know if you need more details. I can schedule a meeting with you if needed.

eddieruan-alibaba · 2024-04-02T18:40:24Z

Hi @eddieruan-alibaba, added a table to compare major steps in warm and express boot. Please let me know if you need more details. I can schedule a meeting with you if needed.

Some comments on your table

Use case, "with change in data plane" This statement is very confusing. You have a limitation that there is no PUNT/Inject header changes and you don't support any version to any version update. Can you put these two constrain in this table. For current one, it only allows N to N+1 without data plane change completely. But in your case, you allow normal data plane change, but no punt path change.
SONiC action. In your case, why do you need SWSS replay? You will relearn all routes and arp/mac between t1 and t2. What's the need for SWSS replay? I assume you only need configuration replay only, not routes/ARP etc in SWSS.
SDK action. You will reset NPU between t2 and t3. Can you mention it clearly or describe the behavior with different time window?
"table configuration " is a little bit confusing term here. I would rather call them as "NPU data". You will DMA "NPU data" to NPU.

Sure. We could have a chat.

jhli-cisco · 2024-04-02T19:45:23Z

Hi @eddieruan-alibaba , what time is good for you? Webex or zoom?

zhangyanzhao · 2024-05-06T01:50:13Z

code PR is not ready, move to backlog for future release

jhli-cisco · 2024-05-09T18:16:48Z

@eddieruan-alibaba , updated to include punt/inject handling.

eddieruan-alibaba · 2024-05-09T22:13:54Z

Thanks. The punt/inject picture looks good.

Can you comment on that in your local testing, what is the gap between t1 and t2 and under what kind of scale.

This window would grow as the scale goes high. The window between t1 and t2 would be your black out window. We need to make sure the neighbor devices would not time out.

jhli-cisco · 2024-05-09T23:03:58Z

Thanks. The punt/inject picture looks good.

Can you comment on that in your local testing, what is the gap between t1 and t2 and under what kind of scale.

This window would grow as the scale goes high. The window between t1 and t2 would be your black out window. We need to make sure the neighbor devices would not time out.

@eddieruan-alibaba do you mean t2 to t3 which is the traffic disruption window?

eddieruan-alibaba · 2024-05-09T23:20:54Z

Thanks. The punt/inject picture looks good.
Can you comment on that in your local testing, what is the gap between t1 and t2 and under what kind of scale.
This window would grow as the scale goes high. The window between t1 and t2 would be your black out window. We need to make sure the neighbor devices would not time out.

@eddieruan-alibaba do you mean t2 to t3 which is the traffic disruption window?

No, I mean the window between t1 and t2. The window between t2 and t3 is your DMA window, which would be very small, in msec level. But since you don't punt traffic between t1 and t2, I want to understand this window's length in your testing with your scale.

jhli-cisco · 2024-05-09T23:34:03Z

Thanks. The punt/inject picture looks good.
Can you comment on that in your local testing, what is the gap between t1 and t2 and under what kind of scale.
This window would grow as the scale goes high. The window between t1 and t2 would be your black out window. We need to make sure the neighbor devices would not time out.

@eddieruan-alibaba do you mean t2 to t3 which is the traffic disruption window?

No, I mean the window between t1 and t2. The window between t2 and t3 is your DMA window, which would be very small, in msec level. But since you don't punt traffic between t1 and t2, I want to understand this window's length in your testing with your scale.

@eddieruan-alibaba Ok, I dont have the exact number. Using standard T0 sonic-management topology, we make sure there is no bgp neighbor and port-channel flapping as part of the sonic-management suite that we add, same as warm-reboot testing. I can get measurement later, but hope this is not an issue for this document approval.

update punt and inject handling

jhli-cisco · 2024-07-03T17:46:51Z

@eddieruan-alibaba , updated the punt and inject change handling after further internal discussion. Please take a look.

jhli-cisco · 2024-07-15T16:16:53Z

Hi @yxieca @rck-innovium @eddieruan-alibaba, updated document per review comments. Please review again and let me know if you have any further comments.

eddieruan-alibaba · 2024-07-26T05:52:28Z

"It can be seen from Figure 1, it is possible that punt-header-v1 reachs SONIC-v2 or inject-header-v2 reachs NPU-v1 during t1 to t2 window. The punt and inject header changes are rare and not commom. Currently punt and inject header data structure differences between V1 and V2 are handled case by case basis in S1 SDK internally."

Should it be a requirement to ask punt/inject headers should be backward compatible instead of assuming the change is rare? I would have concerns on this statement in routing domain. As feature set grows, punt/inject headers are changing release by release. :) For example, IPM may break your assumption unless you restrictedly follow backward compatible.

I would think this could be more easier to get supported if

In DCN only
In bugfix patch release

By the way, NV folks presented a warm reboot approach similar to this one in routing WG, except they cut hardware in half and write V2 in hardware and swap hardware from V1 to V2 with some flag, a.k.a similar to NCS 6K's approach but without control plane redundancy. Your V2 context is cached in software and run DMA to swap from V1 to V2. The meeting minutes could be found at https://lists.sonicfoundation.dev/g/sonic-wg-routing/wiki/37114.

jhli-cisco · 2024-08-02T17:17:32Z

@eddieruan-alibaba We don’t want to explicitly put a restriction on punt/inject headers should be backward compatible, though it may be in most cases. But we would like to give a choice to vendor implementation whether they want to
provide backward compatibility for punt headers or drop the version mismatched punt packets depending on hardware capability.

eddieruan-alibaba · 2024-08-02T17:58:29Z

@eddieruan-alibaba We don’t want to explicitly put a restriction on punt/inject headers should be backward compatible, though it may be in most cases. But we would like to give a choice to vendor implementation whether they want to provide backward compatibility for punt headers or drop the version mismatched punt packets depending on hardware capability.

The success of your approach relies on the assumption that the punt/inject header will remain unchanged or be backward compatible. Also, the current SONiC warm reboot does not rely on this assumption. I recommend explicitly stating this constraint to avoid any confusion or overestimation of this approach.

How this is achieved will depend on the choices of vendors and operators. Some may restrict their deployment to data center networks (DCN), while others may limit the approach to minor releases only.

call out punt-inject header compatibility

jhli-cisco · 2024-08-08T17:12:46Z

@eddieruan-alibaba , updated it to call out backward compatibility explicitly. Please review.

Create Cisco_8000_Express_Reboot_HLD.md

7253d1a

jhli-cisco changed the title ~~Cisco 8000 Express Reboot_HLD~~ Express Reboot_HLD Jan 29, 2024

Update Express_Reboot_HLD.md

45a55fd

venkatmahalingam reviewed Jan 30, 2024

View reviewed changes

zhangyanzhao requested a review from venkatmahalingam January 30, 2024 16:14

rck-innovium suggested changes Jan 30, 2024

View reviewed changes

zhangyanzhao requested a review from prsunny January 30, 2024 16:44

Update Express_Reboot_HLD.md per review comments

fec9ee6

eddieruan-alibaba reviewed Feb 12, 2024

View reviewed changes

yxieca reviewed Feb 12, 2024

View reviewed changes

jhli-cisco and others added 3 commits March 14, 2024 11:23

Update Cisco_8000_Express_Reboot_HLD.md

ce95905

add express reboot flow image

4bb9c85

Update Cisco_8000_Express_Reboot_HLD.md

27e6d3c

jhli-cisco added 2 commits April 2, 2024 11:18

add warm_vs_express.png

9376ccf

Add warm vs express boot comparison

4310731

jhli-cisco added 2 commits May 9, 2024 11:05

update express boot flow to include punt/inject

133342c

Update Cisco_8000_Express_Reboot_HLD.md

6917cff

jhli-cisco added 2 commits July 3, 2024 10:41

Add files via upload

1f78636

Update Cisco_8000_Express_Reboot_HLD.md

049762a

update punt and inject handling

jhli-cisco requested review from yxieca, rck-innovium and eddieruan-alibaba July 15, 2024 16:15

yxieca requested a review from vaibhavhd July 15, 2024 17:00

Update Cisco_8000_Express_Reboot_HLD.md

8195ec4

call out punt-inject header compatibility

SAI_SWITCH_ATTR_FAST_API_ENABLE	Express Boot/shut	Meaning
True	Boot	Valid only for FastFast and Express?? Signals the SAI adapter to begin the bulk programming
False	Boot	Should not be used?
True	Shut	??
False	Shut	??

SAI_SWITCH_ATTR_FAST_API_ENABLE	Express boot	fastfast boot	Meaning
TRUE	Shut	Not used	Pre-shutdown notification
FALSE	Boot	Boot	Signals the SAI adapter to begin the bulk programming post reboot


		## Overview

		The goal of Sonic express reboot is to be able to restart and upgrade SONiC software with sub-second data plane interruption. This is achieved by not initializing external phys, and only initializing NPU after receiving all configuration updates. This document covers updates needed in Sonic and SAI interface.

Express Reboot_HLD #1570

Are you sure you want to change the base?

Express Reboot_HLD #1570

Conversation

jhli-cisco commented Dec 22, 2023

zhangyanzhao commented Jan 17, 2024

jhli-cisco commented Jan 17, 2024 via email • edited Loading

paulmenzel commented Jan 30, 2024

zhangyanzhao commented Jan 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangyanzhao commented Jan 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangyanzhao commented Jan 30, 2024

jhli-cisco commented Feb 7, 2024

eddieruan-alibaba commented Feb 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhli-cisco commented Feb 15, 2024

jhli-cisco commented Mar 26, 2024

eddieruan-alibaba commented Mar 26, 2024

jhli-cisco commented Mar 28, 2024

eddieruan-alibaba commented Mar 28, 2024

jhli-cisco commented Apr 1, 2024

eddieruan-alibaba commented Apr 1, 2024

jhli-cisco commented Apr 2, 2024

eddieruan-alibaba commented Apr 2, 2024

jhli-cisco commented Apr 2, 2024

zhangyanzhao commented May 6, 2024

jhli-cisco commented May 9, 2024

eddieruan-alibaba commented May 9, 2024

jhli-cisco commented May 9, 2024

eddieruan-alibaba commented May 9, 2024

jhli-cisco commented May 9, 2024 • edited Loading

jhli-cisco commented Jul 3, 2024

jhli-cisco commented Jul 15, 2024

eddieruan-alibaba commented Jul 26, 2024

jhli-cisco commented Aug 2, 2024

eddieruan-alibaba commented Aug 2, 2024

jhli-cisco commented Aug 8, 2024

jhli-cisco commented Jan 17, 2024 via email •

edited

Loading

jhli-cisco commented May 9, 2024 •

edited

Loading