-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Express Reboot_HLD #1570
base: master
Are you sure you want to change the base?
Express Reboot_HLD #1570
Conversation
Is this proposal Cisco 8000 specific? Or can apply to other platforms? |
Hi Yanzhao,
It is restricted as cisco 8000 platform specific as of now, as that is where the SDK support has been implemented.
|
(Please remove the citation part, if you reply via email.) Have you already implemented this as a prototype, and can you share the timings? |
Community reviewed on 1/30/2024. |
3. During express boot pre-shutdown, syncd sets SAI_SWITCH_ATTR_FAST_API_ENABLE = true in setPreShutdownOnAllSwitches() to notify SDK for proper cleanup. | ||
|
||
``` | ||
sai_status_t yncd::setPreShutdownOnAllSwitches() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. Syncd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!. will correct it.
We should define a clear reboot time threshold for express reboot, also please keep in mind that this feature may impact other platforms and the threshold should apply to other platforms. |
* A special type of boot used by Cisco platforms to start in 'express' | ||
* boot mode | ||
*/ | ||
SAI_START_TYPE_EXPRESS_BOOT = 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need to describe what the SAI implementation needs to do for Express Boot?
- How is Express Boot different from FastFast boot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update SAI to add express boot type. The difference is that express boot needs its own pre-shutdown notification to differentiate from warm boot, but fastfast boot uses the same pre-shutdown notification as warm boot.
* A special type of boot used by Cisco platforms to start in 'express' | ||
* boot mode | ||
*/ | ||
SAI_START_TYPE_EXPRESS_BOOT = 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need to describe what the SAI implementation needs to do for Express Boot?
- How is Express Boot different from FastFast boot?
*/ | ||
SAI_START_TYPE_EXPRESS_BOOT = 4, | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Thanks!
*/ | ||
SAI_START_TYPE_EXPRESS_BOOT = 4, | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | ||
sai_status_t yncd::setPreShutdownOnAllSwitches() | ||
{ | ||
if (shutdownType == SYNCD_RESTART_TYPE_EXPRESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this step is the only difference between Express boot vs FastFast boot.
Why is this step needed? During Express shut, the SAI implementation must not reset the ASIC.
This can be achieved by setting SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs a pre-shutdown notification so that platform hardware knows it and does necessary things pre- and during shutdown. SAI_SWITCH_ATTR_FAST_API_ENABLE is used after boot upon receving SAI_REDIS_NOTIFY_SYNCD_APPLY_VIEW. SO we used the same attr for pre-shutdown notification as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are overloading SAI_SWITCH_ATTR_FAST_API_ENABLE attribute. We should clearly define what the underlying SAI implementation must do for Express boot/shut.
SAI_SWITCH_ATTR_FAST_API_ENABLE | Express Boot/shut | Meaning |
True | Boot | Valid only for FastFast and Express?? Signals the SAI adapter to begin the bulk programming
|
False | Boot | Should not be used? |
True | Shut | ?? |
False | Shut | ?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is how the SAI_SWITCH_ATTR_FAST_API_ENABLE being used for express/fastfast boot. Will add this info to the doc.
SAI_SWITCH_ATTR_FAST_API_ENABLE | Express boot | fastfast boot | Meaning |
---|---|---|---|
TRUE | Shut | Not used | Pre-shutdown notification |
FALSE | Boot | Boot | Signals the SAI adapter to begin the bulk programming post reboot |
``` | ||
sai_status_t yncd::setPreShutdownOnAllSwitches() | ||
{ | ||
if (shutdownType == SYNCD_RESTART_TYPE_EXPRESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this step is the only difference between Express boot vs FastFast boot.
Why is this step needed? During Express shut, the SAI implementation must not reset the ASIC.
This can be achieved by setting SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL to false.
community review recording https://zoom.us/rec/share/Vp1yF8lprkAryYApvE421KZdeE7M62dB6q0QT9NstwKasFA3jDMMqhBi9nudKaVk.ZTQJQ2IVqoIbl_Cx |
It is sub-second traffic interruption. |
Can you explain how you could get "SDK needs a notification once all configurations are restored after express boot."? Would it be configuration only, or it would include BGP routes redownloading? |
|
||
## Overview | ||
|
||
The goal of Sonic express reboot is to be able to restart and upgrade SONiC software with sub-second data plane interruption. This is achieved by not initializing external phys, and only initializing NPU after receiving all configuration updates. This document covers updates needed in Sonic and SAI interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a section to describe the timelines on control plane and data plane version changes during this process, how the control traffic will be handled especially when control plane and data plane are not in the same version of images?
"all configuration updates", do you include operation data such as BGP routes learnt from remote side to be updated as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddieruan-alibaba add a drawing that illustrates the interactions between v1/v2 sonic/sdk/npu during image upgrade. Please let me know if it is what you were looking for.
The existing fast-reboot script ((https://github.com/sonic-net/sonic-utilities/blob/master/scripts/fast-reboot) that is used to trigger fast-reboot/warm-boot will be enhanced to support express-reboot. Major changes are: | ||
|
||
1. Enforce express-reboot is only applicable to Cisco 8000 platforms at the moment. | ||
2. During express-reboot shutdown path, syncd_request_shutdown will be called with a new option “-pxe” to inform syncd pre-shutdown for express boot mode. The actual calls will be “docker exec syncd /usr/bin/syncd_request_shutdown –pxe”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does -pxe stand for? Why not use -exp as it is the prefix of express reboot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI Ying, it stands for pre-expressboot. This is the notification before reboot.
Hi Eddie, this is the same as in warm-reboot case, not only configuration, I believe including BGP routes. |
@yxieca @eddieruan-alibaba @rck-innovium Please let me know if you have any more comments. |
When do you enable the punt path from NPU to SONiC? between t1 and t2, or between t3 and t4? |
@eddieruan-alibaba the punt path is enabled for both periods. Between t1 and t2 the npu is v1 version and between t3 and t4 is in v2 version. Routes/ARP are replayed by v2 swss between t1 and t2 from saved db. |
Can you add some detail information about this punt path handling in your HLD? Between t1 and t2, your v1 hardware will send traffic to v2 software. Since you want to be able upgrade from any version to any version, this version mismatch punt would be a very strong requirement on punt header backward compatibility. Also, can you add some detail information on how to replay Routes / ARP between t1 and t2, and what benefits you want to achieve with replay. From when BGP would relearn the routes, and how BGP learnt routes get resync with your replay routes. For ARP entries, how do you tell kernel to install your replay entry and how kernel should handle it with learnt ARP packets between t1 and t2. I don't have a clear picture in this area. |
Hi @eddieruan-alibaba, there isn't special handling in punt path. Express boot isn't targeted for upgrade from any version to any version, and in reality, it can't. The upgrade in control plane/sonic still follows warm boot requirements. Route/ARP replay after reboot also follows warm-reboot logic which is documented in https://github.com/sonic-net/SONiC/blob/master/doc/warm-reboot/swss_warm_restart.md. The only difference from warm-reboot is in syncd the reconciliation is skiped for express boot. Hope this clarrify things a bit. |
Can you create a table to compare your approach with existing approach from boot requirements to implementations. Since you don't hold "any version to any version upgrade", I don't have a clear picture on what your approach is different from existing one. It would be great if you could articulate it clearly. |
Hi @eddieruan-alibaba, added a table to compare major steps in warm and express boot. Please let me know if you need more details. I can schedule a meeting with you if needed. |
Some comments on your table
Sure. We could have a chat. |
Hi @eddieruan-alibaba , what time is good for you? Webex or zoom? |
code PR is not ready, move to backlog for future release |
@eddieruan-alibaba , updated to include punt/inject handling. |
Thanks. The punt/inject picture looks good. Can you comment on that in your local testing, what is the gap between t1 and t2 and under what kind of scale. This window would grow as the scale goes high. The window between t1 and t2 would be your black out window. We need to make sure the neighbor devices would not time out. |
@eddieruan-alibaba do you mean t2 to t3 which is the traffic disruption window? |
No, I mean the window between t1 and t2. The window between t2 and t3 is your DMA window, which would be very small, in msec level. But since you don't punt traffic between t1 and t2, I want to understand this window's length in your testing with your scale. |
@eddieruan-alibaba Ok, I dont have the exact number. Using standard T0 sonic-management topology, we make sure there is no bgp neighbor and port-channel flapping as part of the sonic-management suite that we add, same as warm-reboot testing. I can get measurement later, but hope this is not an issue for this document approval. |
update punt and inject handling
@eddieruan-alibaba , updated the punt and inject change handling after further internal discussion. Please take a look. |
Hi @yxieca @rck-innovium @eddieruan-alibaba, updated document per review comments. Please review again and let me know if you have any further comments. |
"It can be seen from Figure 1, it is possible that punt-header-v1 reachs SONIC-v2 or inject-header-v2 reachs NPU-v1 during t1 to t2 window. The punt and inject header changes are rare and not commom. Currently punt and inject header data structure differences between V1 and V2 are handled case by case basis in S1 SDK internally." Should it be a requirement to ask punt/inject headers should be backward compatible instead of assuming the change is rare? I would have concerns on this statement in routing domain. As feature set grows, punt/inject headers are changing release by release. :) For example, IPM may break your assumption unless you restrictedly follow backward compatible. I would think this could be more easier to get supported if
By the way, NV folks presented a warm reboot approach similar to this one in routing WG, except they cut hardware in half and write V2 in hardware and swap hardware from V1 to V2 with some flag, a.k.a similar to NCS 6K's approach but without control plane redundancy. Your V2 context is cached in software and run DMA to swap from V1 to V2. The meeting minutes could be found at https://lists.sonicfoundation.dev/g/sonic-wg-routing/wiki/37114. |
@eddieruan-alibaba We don’t want to explicitly put a restriction on punt/inject headers should be backward compatible, though it may be in most cases. But we would like to give a choice to vendor implementation whether they want to |
The success of your approach relies on the assumption that the punt/inject header will remain unchanged or be backward compatible. Also, the current SONiC warm reboot does not rely on this assumption. I recommend explicitly stating this constraint to avoid any confusion or overestimation of this approach. How this is achieved will depend on the choices of vendors and operators. Some may restrict their deployment to data center networks (DCN), while others may limit the approach to minor releases only. |
call out punt-inject header compatibility
@eddieruan-alibaba , updated it to call out backward compatibility explicitly. Please review. |
No description provided.