From 9f58c56cb6a905355a4271452e3da398836bbd7f Mon Sep 17 00:00:00 2001 From: Prince Sunny Date: Thu, 9 Sep 2021 19:08:11 -0700 Subject: [PATCH 1/6] Overlay ECMP with BFD support HLD Vxlan Overlay ECMP BFD orchagent and HW support --- doc/vxlan/Overlay ECMP with BFD.md | 216 +++++++++++++++++++++++++++++ 1 file changed, 216 insertions(+) create mode 100644 doc/vxlan/Overlay ECMP with BFD.md diff --git a/doc/vxlan/Overlay ECMP with BFD.md b/doc/vxlan/Overlay ECMP with BFD.md new file mode 100644 index 00000000000..bebdfa9ae92 --- /dev/null +++ b/doc/vxlan/Overlay ECMP with BFD.md @@ -0,0 +1,216 @@ +# Overlay ECMP with BFD monitoring +## High Level Design Document +### Rev 1.0 + +# Table of Contents + + * [Revision](#revision) + + * [About this Manual](#about-this-manual) + + * [Scope](#scope) + + * [Definitions/Abbreviation](#definitionsabbreviation) + + * [1 Requirements Overview](#1-requirements-overview) + * [1.1 Usecase](#11-Usecase) + * [1.2 Functional requirements](#12-functional-requirements) + * [1.3 CLI requirements](#13-cli-requirements) + * [1.4 Warm Restart requirements ](#14-warm-restart-requirements) + * [2 Modules Design](#2-modules-design) + * [2.1 Config DB](#21-config-db) + * [2.2 App DB](#22-app-db) + * [2.3 Orchestration Agent](#23-orchestration-agent) + * [2.4 CLI](#24-cli) + +###### Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 09/09/2021 | Prince Sunny | Initial version | + +# About this Manual +This document provides general information about the Vxlan Overlay ECMP feature implementation in SONiC with BFD support. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/Azure/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) +# Scope +This document describes the high level design of the Overlay ECMP feature and associated BFD support. General BFD support and configurations are beyond the scope of this document. + +# Definitions/Abbreviation +###### Table 1: Abbreviations +| | | +|--------------------------|--------------------------------| +| BFD | Bidirectional Forwarding | +| VNI | Vxlan Network Identifier | +| VTEP | Vxlan Tunnel End Point | +| VNet | Virtual Network | + +# 1 Requirements Overview + +## 1.1 Usecase + +![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_UseCase.png) + +## 1.1 Functional requirements + +At a high level the following should be supported: + +- Configure ECMP with Tunnel Nexthops (IPv4 and IPv6) +- Tunnel Endpoint monitoring via BFD +- Add/Withdraw Nexthop based on Tunnel or Endpoint health + +## 1.2 CLI requirements +- User should be able to show the BFD session +- User should be able to show the Vnet routes + +## 1.3 Warm Restart requirements +No special handling for Warm restart support. + +# 2 Modules Design + +The following are the schema changes. + +## 2.1 Config DB + +Existing Vxlan and Vnet tables. + +### 2.1.1 VXLAN Table +``` +VXLAN_TUNNEL|{{tunnel_name}} + "src_ip": {{ip_address}} + "dst_ip": {{ip_address}} (OPTIONAL) +``` +### 2.1.2 VNET/Interface Table +``` +VNET|{{vnet_name}} + "vxlan_tunnel": {{tunnel_name}} + "vni": {{vni}} + "scope": {{"default"}} (OPTIONAL) + "peer_list": {{vnet_name_list}} (OPTIONAL) +``` + +## 2.2 APP DB + +### VNET + +The following are the changes for Vnet Route table + +Existing: + +``` +VNET_ROUTE_TUNNEL_TABLE:{{vnet_name}}:{{prefix}} + "endpoint": {{ip_address}} + "mac_address":{{mac_address}} (OPTIONAL) + "vni": {{vni}}(OPTIONAL) +``` + +Proposed: +``` +VNET_ROUTE_TUNNEL_TABLE:{{vnet_name}}:{{prefix}} + "endpoint": {{ip_address1},{ip_address2},...} + "endpoint_monitor": {{ip_address1},{ip_address2},...} (OPTIONAL) + "mac_address":{{mac_address1},{mac_address2},...} (OPTIONAL) + "vni": {{vni1},{vni2},...} (OPTIONAL) + "weight": {{w1},{w2},...} (OPTIONAL) + “profile”: {{profile_name}} (OPTIONAL) +``` + +``` +key = VNET_ROUTE_TUNNEL_TABLE:vnet_name:prefix ; Vnet route tunnel table with prefix +; field = value +ENDPOINT = list of ipv4 addresses ; comma separated list of endpoints +ENDPOINT_MONITOR = list of ipv4 addresses ; comma separated list of endpoints +MAC_ADDRESS = 12HEXDIG ; Inner dst mac in encapsulated packet +VNI = DIGITS ; VNI value in encapsulated packet +WEIGHT = DIGITS ; Weights for the nexthops, comma separated (Optional) +PROFILE = STRING ; profile name to be applied for this route, for community + string etc (Optional) +``` + +### BFD + +``` +BFD_SESSION:{{ifname}}:{{prefix}} + "tx_interval": {{interval}} (OPTIONAL) + "rx_interval": {{interval}} (OPTIONAL) + "multiplier": {{detection multiplier}} (OPTIONAL) + "shutdown": {{false}} + "multihop": {{false}} + "local_addr": {{ipv4/v6}} (OPTIONAL) + "type": {{string}} (active/passive..) +; Defines APP DB schema to initiate BFD session. +``` + +## 2.3 Module Interaction + +Overlay routes can be programmed via RestAPI or gNMI/gRPC interface which is not described in this document. A highlevel module interaction is shown below + +![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_ModuleInteraction.png) + +## 2.3 Orchestration Agent +Following orchagents shall be modified. + +### VnetOrch + +#### Requirements + +- Vnetorch to add support to handle multiple endpoints for APP_VNET_RT_TUNNEL_TABLE_NAME based route task. +- Reuse Nexthop tunnel based on the endpoint configuration. +- If there is already the same endpoint exists, use that as member for Nexthop group. +- Similar to above, reuse nexthop group, if multiple routes are programmed with the same set of nexthops. +- Provide support for endpoint modification for a route prefix. Require SAI support for SET operation of routes. +- Provide support for endpoint deletion for a route prefix. Orchagent shall check the existing entries and delete any tunnel/nexthop based on the new route update +- Ensure backward compatibility with single endpoint routes +- Use SAI_NEXT_HOP_GROUP_MEMBER_ATTR_WEIGHT for specifying weights to nexthop member +- Desirable to have per tunnel stats via sai_tunnel_stat_t + +#### Detailed flow + +VnetOrch is one of the critical module for supporting overlay ecmp. VnetOrch subscribes to VNET and ROUTE updates from APP_DB. + +When a new route update is processed by the add operation, + +1. VnetOrch checks the nexthop group and if it exists, reuse the group +2. For a new nexthop group member, add the ECMP member and identify the corresponding monitoring IP address. Create a mapping between the monitoring IP and nexthop tunnel endpoint. +3. Initiate a BFD session for the monitoring IP if it does not exist +4. Based on the BFD implementation (BfdOrch vs Control plane BFD), subscribe to BFD state change, either directly as subject observer (similar to port oper state notifications in orchagent) or via STATEDB update. +5. Based on the VNET global configuration to advertise prefixes, indicate to STATEDB if the prefix must be advertised by BGP/FRR only if there is atleast one active nexthop. Remove this entry if there are no active nexthops indicated by BFD session down so that the network pfx is no longer advertised. + +#### Monitoring Endpoint Mapping + +VNET_ROUTE_TUNNEL_TABLE can provide monitoring endpoint IPs which can be different from the tunnel termination endpoints. VnetOrch creates a mapping for such endpoints and based on the monitoring endpoint (MonEP1) health, proceed with adding/removing nexthop tunnel endpoint (EP1) from the ECMP group for the respective prefix. It is assumed that for one tunnel termination endpoint (EP1), there shall be only one corresponding monitoring endpoint (MonEP1). + +#### Pros of SWSS to handle route update based on tunnel nexthop health: + +- No significant changes, if BFD session management is HW offload via SAI notifications or Control Plane assisted. +- Similar to NHFLAGS handling for existing route ECMP group +- Better performance in re-programming routes in ASIC instead of separate process to monitor and modify each route prefix by updating DB entries + +### BfdOrch +Sonic may offload the BFD session handling to hardware that has BFD capabilities. A new module, BfdOrch shall be introduced to handle BFD session to monitoring endpoints and check the health of remote endpoints. BfdOrch shall offload the session initiation/sustenance to hardware via SAI APIs and gets the notifications of session state from SAI. The session state shall be updated in STATE_DB and to any other observer orchestration agents. + +![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_BFD.png) + +For offloading, the following shall be the SAI attributes programmed by BfdOrch. + +| Attribute Type | Value | +|--------------------------|--------------------------------| +| SAI_BFD_SESSION_ATTR_TYPE | SAI_BFD_SESSION_TYPE_ASYNC_ACTIVE | +| SAI_BFD_SESSION_ATTR_OFFLOAD_TYPE | SAI_BFD_SESSION_OFFLOAD_TYPE_FULL | +| SAI_BFD_SESSION_ATTR_BFD_ENCAPSULATION_TYPE | SAI_BFD_ENCAPSULATION_TYPE_NONE | +| SAI_BFD_SESSION_ATTR_SRC_IP_ADDRESS | Loopback0 IPv4 or v6 address | +| SAI_BFD_SESSION_ATTR_DST_IP_ADDRESS | Remote IPv4 or v6 address | +| SAI_BFD_SESSION_ATTR_MULTIHOP | True | + +Sai shall notify via notification channel on the session state as one of sai_bfd_session_state_t. BfdOrch can listen on these notifications and update the StateDB for the session state. + +The flow of BfdOrch is presented in the following figure. BfdOrch subscribes to the BFD_SESSION_TABLE of APPL_DB and send the corresponding request to program the BFD sessions to syncd accordingly. The BfdOrch also creates the STATE_DB entry of the BFD session which includes the BFD parameters and an initial state. Upon receiving bfd session state change notifications from syncd, BfdOrch update the STATE_DB field to update the BFD session state. + +![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_BFD_Notification.png) + +## 2.5 CLI + +The following commands shall be modified/added : + +``` + - show vnet routes all + - show vnet routes tunnel + - show bfd session +``` From b9f1e94235553c825de67d244c9e8836f369b965 Mon Sep 17 00:00:00 2001 From: Prince Sunny Date: Mon, 13 Sep 2021 22:19:44 -0700 Subject: [PATCH 2/6] Update Overlay ECMP with BFD.md --- doc/vxlan/Overlay ECMP with BFD.md | 56 +++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 12 deletions(-) diff --git a/doc/vxlan/Overlay ECMP with BFD.md b/doc/vxlan/Overlay ECMP with BFD.md index bebdfa9ae92..3c85a6f8059 100644 --- a/doc/vxlan/Overlay ECMP with BFD.md +++ b/doc/vxlan/Overlay ECMP with BFD.md @@ -13,20 +13,24 @@ * [Definitions/Abbreviation](#definitionsabbreviation) * [1 Requirements Overview](#1-requirements-overview) - * [1.1 Usecase](#11-Usecase) + * [1.1 Usecase](#11-usecase) * [1.2 Functional requirements](#12-functional-requirements) * [1.3 CLI requirements](#13-cli-requirements) * [1.4 Warm Restart requirements ](#14-warm-restart-requirements) * [2 Modules Design](#2-modules-design) * [2.1 Config DB](#21-config-db) * [2.2 App DB](#22-app-db) - * [2.3 Orchestration Agent](#23-orchestration-agent) - * [2.4 CLI](#24-cli) + * [2.3 Module Interaction](#23-module-interaction) + * [2.4 Orchestration Agent](#24-orchestration-agent) + * [2.5 Monitoring and Health](#25-monitoring-and-health) + * [2.6 BGP](#26-bgp) + * [2.7 CLI](#27-cli) ###### Revision | Rev | Date | Author | Change Description | |:---:|:-----------:|:------------------:|-----------------------------------| | 0.1 | 09/09/2021 | Prince Sunny | Initial version | +| 1.0 | 09/13/2021 | Prince Sunny | Revised | # About this Manual This document provides general information about the Vxlan Overlay ECMP feature implementation in SONiC with BFD support. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/Azure/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) @@ -37,7 +41,7 @@ This document describes the high level design of the Overlay ECMP feature and as ###### Table 1: Abbreviations | | | |--------------------------|--------------------------------| -| BFD | Bidirectional Forwarding | +| BFD | Bidirectional Forwarding Detection | | VNI | Vxlan Network Identifier | | VTEP | Vxlan Tunnel End Point | | VNet | Virtual Network | @@ -48,7 +52,7 @@ This document describes the high level design of the Overlay ECMP feature and as ![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_UseCase.png) -## 1.1 Functional requirements +## 1.2 Functional requirements At a high level the following should be supported: @@ -56,11 +60,11 @@ At a high level the following should be supported: - Tunnel Endpoint monitoring via BFD - Add/Withdraw Nexthop based on Tunnel or Endpoint health -## 1.2 CLI requirements +## 1.3 CLI requirements - User should be able to show the BFD session - User should be able to show the Vnet routes -## 1.3 Warm Restart requirements +## 1.4 Warm Restart requirements No special handling for Warm restart support. # 2 Modules Design @@ -84,6 +88,7 @@ VNET|{{vnet_name}} "vni": {{vni}} "scope": {{"default"}} (OPTIONAL) "peer_list": {{vnet_name_list}} (OPTIONAL) + "advertise_prefix": {{false}} (OPTIONAL) ``` ## 2.2 APP DB @@ -115,8 +120,8 @@ VNET_ROUTE_TUNNEL_TABLE:{{vnet_name}}:{{prefix}} ``` key = VNET_ROUTE_TUNNEL_TABLE:vnet_name:prefix ; Vnet route tunnel table with prefix ; field = value -ENDPOINT = list of ipv4 addresses ; comma separated list of endpoints -ENDPOINT_MONITOR = list of ipv4 addresses ; comma separated list of endpoints +ENDPOINT = list of ipv4 addresses ; comma separated list of endpoints +ENDPOINT_MONITOR = list of ipv4 addresses ; comma separated list of endpoints, space for empty/no monitoring MAC_ADDRESS = 12HEXDIG ; Inner dst mac in encapsulated packet VNI = DIGITS ; VNI value in encapsulated packet WEIGHT = DIGITS ; Weights for the nexthops, comma separated (Optional) @@ -144,7 +149,7 @@ Overlay routes can be programmed via RestAPI or gNMI/gRPC interface which is not ![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_ModuleInteraction.png) -## 2.3 Orchestration Agent +## 2.4 Orchestration Agent Following orchagents shall be modified. ### VnetOrch @@ -184,7 +189,7 @@ VNET_ROUTE_TUNNEL_TABLE can provide monitoring endpoint IPs which can be differe - Better performance in re-programming routes in ASIC instead of separate process to monitor and modify each route prefix by updating DB entries ### BfdOrch -Sonic may offload the BFD session handling to hardware that has BFD capabilities. A new module, BfdOrch shall be introduced to handle BFD session to monitoring endpoints and check the health of remote endpoints. BfdOrch shall offload the session initiation/sustenance to hardware via SAI APIs and gets the notifications of session state from SAI. The session state shall be updated in STATE_DB and to any other observer orchestration agents. +Sonic shall offload the BFD session handling to hardware that has BFD capabilities. A new module, BfdOrch shall be introduced to handle BFD session to monitoring endpoints and check the health of remote endpoints. BfdOrch shall offload the session initiation/sustenance to hardware via SAI APIs and gets the notifications of session state from SAI. The session state shall be updated in STATE_DB and to any other observer orchestration agents. ![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_BFD.png) @@ -205,7 +210,32 @@ The flow of BfdOrch is presented in the following figure. BfdOrch subscribes to ![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_BFD_Notification.png) -## 2.5 CLI +A Control Plane BFD approach is to use FRR BFD and enable bfdctl module. This shall be part of the Sonic BGP container. For the current usecase, the BFD hw offload is being considered and control plane BFD using FRR is not scoped in this document. + +## 2.5 Monitoring and Health + +The routes are programmed based on the health of tunnel endpoints. It is possible that a tunnel endpoint health is monitored via another dedicated “monitoring” endpoint. It is required to have a “keep-alive” mechanism to monitor the health of end point and withdraw or reinstall the route when the endpoint is inactive or active respectively. +When an endpoint is deemed unhealthy, router shall perform the following actions: +1. Remove the nexthop from the ECMP path. If all endpoints are down, the route shall be withdrawn. +2. If 50% of the nexthops are down, an alert shall be generated. + +## 2.6 BGP + +Advertise VNET routes +The overlay routes programmed on the device must be advertised to BGP peers. This can be achieved by the “network” command. + +For example: +``` +router bgp 1 + address-family ipv4 unicast + network 10.0.0.0/8 + exit-address-family + ``` + +This configuration example says that network 10.0.0.0/8 will be announced to all neighbors. FRR bgpd doesn’t care about IGP routes when announcing its routes. + + +## 2.7 CLI The following commands shall be modified/added : @@ -214,3 +244,5 @@ The following commands shall be modified/added : - show vnet routes tunnel - show bfd session ``` + +Config commands for VNET, VNET Routes and BFD session is not considered in this design. This shall be added later based on requirement. It is taken into consideration of future BFD enhancement to have the sessions created via config_db. From 1230d23e83560ca79db0701af9f1d20020eda086 Mon Sep 17 00:00:00 2001 From: Prince Sunny Date: Fri, 8 Oct 2021 17:51:59 -0700 Subject: [PATCH 3/6] Update Overlay ECMP with BFD.md --- doc/vxlan/Overlay ECMP with BFD.md | 63 +++++++++--------------------- 1 file changed, 19 insertions(+), 44 deletions(-) diff --git a/doc/vxlan/Overlay ECMP with BFD.md b/doc/vxlan/Overlay ECMP with BFD.md index 3c85a6f8059..da950c66511 100644 --- a/doc/vxlan/Overlay ECMP with BFD.md +++ b/doc/vxlan/Overlay ECMP with BFD.md @@ -1,6 +1,6 @@ # Overlay ECMP with BFD monitoring ## High Level Design Document -### Rev 1.0 +### Rev 1.1 # Table of Contents @@ -8,8 +8,6 @@ * [About this Manual](#about-this-manual) - * [Scope](#scope) - * [Definitions/Abbreviation](#definitionsabbreviation) * [1 Requirements Overview](#1-requirements-overview) @@ -27,15 +25,16 @@ * [2.7 CLI](#27-cli) ###### Revision + | Rev | Date | Author | Change Description | |:---:|:-----------:|:------------------:|-----------------------------------| | 0.1 | 09/09/2021 | Prince Sunny | Initial version | | 1.0 | 09/13/2021 | Prince Sunny | Revised | +| 1.1 | 10/08/2021 | Prince Sunny | BFD section seperated | # About this Manual This document provides general information about the Vxlan Overlay ECMP feature implementation in SONiC with BFD support. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/Azure/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) -# Scope -This document describes the high level design of the Overlay ECMP feature and associated BFD support. General BFD support and configurations are beyond the scope of this document. + # Definitions/Abbreviation ###### Table 1: Abbreviations @@ -46,12 +45,22 @@ This document describes the high level design of the Overlay ECMP feature and as | VTEP | Vxlan Tunnel End Point | | VNet | Virtual Network | + # 1 Requirements Overview ## 1.1 Usecase +Below diagram captures the use-case. In this, ToR is a Tier0 device and Leaf is a Tier1 device. Vxlan tunnel is established from Leaf (Tier1) to a VTEP endpoint. ToR (Tier0), Spine (Tier3) are transit devices. + + ![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_UseCase.png) +### Packet flow + +- The packets destined to the Tunnel Enpoint shall be Vxlan encapsulated by the Leaf (Tier1). +- Return packet from the Tunnel Endpoint (LBs) back to Leaf may or may not be Vxlan encapsualted. +- Some flows e.g. BFD over Vxlan shall require decapsulating Vxlan packets at Leaf. + ## 1.2 Functional requirements At a high level the following should be supported: @@ -61,8 +70,8 @@ At a high level the following should be supported: - Add/Withdraw Nexthop based on Tunnel or Endpoint health ## 1.3 CLI requirements -- User should be able to show the BFD session - User should be able to show the Vnet routes +- This is an enhancement to existing show command ## 1.4 Warm Restart requirements No special handling for Warm restart support. @@ -129,20 +138,6 @@ PROFILE = STRING ; profile name to be applie string etc (Optional) ``` -### BFD - -``` -BFD_SESSION:{{ifname}}:{{prefix}} - "tx_interval": {{interval}} (OPTIONAL) - "rx_interval": {{interval}} (OPTIONAL) - "multiplier": {{detection multiplier}} (OPTIONAL) - "shutdown": {{false}} - "multihop": {{false}} - "local_addr": {{ipv4/v6}} (OPTIONAL) - "type": {{string}} (active/passive..) -; Defines APP DB schema to initiate BFD session. -``` - ## 2.3 Module Interaction Overlay routes can be programmed via RestAPI or gNMI/gRPC interface which is not described in this document. A highlevel module interaction is shown below @@ -188,33 +183,14 @@ VNET_ROUTE_TUNNEL_TABLE can provide monitoring endpoint IPs which can be differe - Similar to NHFLAGS handling for existing route ECMP group - Better performance in re-programming routes in ASIC instead of separate process to monitor and modify each route prefix by updating DB entries -### BfdOrch -Sonic shall offload the BFD session handling to hardware that has BFD capabilities. A new module, BfdOrch shall be introduced to handle BFD session to monitoring endpoints and check the health of remote endpoints. BfdOrch shall offload the session initiation/sustenance to hardware via SAI APIs and gets the notifications of session state from SAI. The session state shall be updated in STATE_DB and to any other observer orchestration agents. - -![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_BFD.png) - -For offloading, the following shall be the SAI attributes programmed by BfdOrch. - -| Attribute Type | Value | -|--------------------------|--------------------------------| -| SAI_BFD_SESSION_ATTR_TYPE | SAI_BFD_SESSION_TYPE_ASYNC_ACTIVE | -| SAI_BFD_SESSION_ATTR_OFFLOAD_TYPE | SAI_BFD_SESSION_OFFLOAD_TYPE_FULL | -| SAI_BFD_SESSION_ATTR_BFD_ENCAPSULATION_TYPE | SAI_BFD_ENCAPSULATION_TYPE_NONE | -| SAI_BFD_SESSION_ATTR_SRC_IP_ADDRESS | Loopback0 IPv4 or v6 address | -| SAI_BFD_SESSION_ATTR_DST_IP_ADDRESS | Remote IPv4 or v6 address | -| SAI_BFD_SESSION_ATTR_MULTIHOP | True | - -Sai shall notify via notification channel on the session state as one of sai_bfd_session_state_t. BfdOrch can listen on these notifications and update the StateDB for the session state. - -The flow of BfdOrch is presented in the following figure. BfdOrch subscribes to the BFD_SESSION_TABLE of APPL_DB and send the corresponding request to program the BFD sessions to syncd accordingly. The BfdOrch also creates the STATE_DB entry of the BFD session which includes the BFD parameters and an initial state. Upon receiving bfd session state change notifications from syncd, BfdOrch update the STATE_DB field to update the BFD session state. +### Bfd HW offload -![](https://github.com/Azure/SONiC/blob/master/images/vxlan_hld/OverlayEcmp_BFD_Notification.png) +This design requires endpoint health monitoring by setting BFD sessions via HW offload. Details of BFD orchagent and HW offloading is captured in this [document](https://github.com/Azure/SONiC/blob/master/doc/bfd/BFD%20HW%20Offload%20HLD.md) -A Control Plane BFD approach is to use FRR BFD and enable bfdctl module. This shall be part of the Sonic BGP container. For the current usecase, the BFD hw offload is being considered and control plane BFD using FRR is not scoped in this document. ## 2.5 Monitoring and Health -The routes are programmed based on the health of tunnel endpoints. It is possible that a tunnel endpoint health is monitored via another dedicated “monitoring” endpoint. It is required to have a “keep-alive” mechanism to monitor the health of end point and withdraw or reinstall the route when the endpoint is inactive or active respectively. +The routes are programmed based on the health of tunnel endpoints. It is possible that a tunnel endpoint health is monitored via another dedicated “monitoring” endpoint. Implementation shall enforce a “keep-alive” mechanism to monitor the health of end point and withdraw or reinstall the route when the endpoint is inactive or active respectively. When an endpoint is deemed unhealthy, router shall perform the following actions: 1. Remove the nexthop from the ECMP path. If all endpoints are down, the route shall be withdrawn. 2. If 50% of the nexthops are down, an alert shall be generated. @@ -242,7 +218,6 @@ The following commands shall be modified/added : ``` - show vnet routes all - show vnet routes tunnel - - show bfd session ``` -Config commands for VNET, VNET Routes and BFD session is not considered in this design. This shall be added later based on requirement. It is taken into consideration of future BFD enhancement to have the sessions created via config_db. +Config commands for VNET, VNET Routes and BFD session is not considered in this design. This shall be added later based on requirement. From 8ca1ac93c8912fda7b09de9bfd51498e5038c292 Mon Sep 17 00:00:00 2001 From: Prince Sunny Date: Mon, 18 Oct 2021 19:04:30 -0700 Subject: [PATCH 4/6] Update Overlay ECMP with BFD.md --- doc/vxlan/Overlay ECMP with BFD.md | 71 +++++++++++++++++++++++++++++- 1 file changed, 70 insertions(+), 1 deletion(-) diff --git a/doc/vxlan/Overlay ECMP with BFD.md b/doc/vxlan/Overlay ECMP with BFD.md index da950c66511..24b66f1946a 100644 --- a/doc/vxlan/Overlay ECMP with BFD.md +++ b/doc/vxlan/Overlay ECMP with BFD.md @@ -23,14 +23,16 @@ * [2.5 Monitoring and Health](#25-monitoring-and-health) * [2.6 BGP](#26-bgp) * [2.7 CLI](#27-cli) + * [2.8 Test Plan](#28-test-plan) ###### Revision | Rev | Date | Author | Change Description | |:---:|:-----------:|:------------------:|-----------------------------------| | 0.1 | 09/09/2021 | Prince Sunny | Initial version | -| 1.0 | 09/13/2021 | Prince Sunny | Revised | +| 1.0 | 09/13/2021 | Prince Sunny | Revised based on review comments | | 1.1 | 10/08/2021 | Prince Sunny | BFD section seperated | +| 1.2 | 10/18/2021 | Prince Sunny/Shi Su | Test Plan added | # About this Manual This document provides general information about the Vxlan Overlay ECMP feature implementation in SONiC with BFD support. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/Azure/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) @@ -221,3 +223,70 @@ The following commands shall be modified/added : ``` Config commands for VNET, VNET Routes and BFD session is not considered in this design. This shall be added later based on requirement. + +## 2.8 Test Plan + +Pre-requisite: + +Create VNET and Vxlan tunnel as an below: + +``` +{  + "VXLAN_TUNNEL": { + "tunnel1": { + "src_ip": "10.1.0.32" + } + }, + + "VNET": { + "Vnet_3000": { + "vxlan_tunnel": "tunnel1", + "vni": "3000", + "scope": "default" + } +    } +``` + +For ```default``` scope, no need to associate interfaces to a VNET + +VNET tunnel routes must be created as shown in the example below + +``` +[ +    "VNET_ROUTE_TUNNEL_TABLE:Vnet_3000:100.100.2.1/32": {  +        "endpoint": "1.1.1.2",  + "endpoint_monitor": "1.1.2.2" +    }  +] +``` + +### Test Cases + +#### Overlay ECMP + +It is assumed that the endpoint IPs may not have exact match underlay route but may have an LPM underlay route or a default route. + +| Step | Goal | Expected results | +|-|-|-| +|Create a tunnel route to a single endpoint a. Send packets to the route prefix dst| Tunnel route create | Packets are received only at endpoint a | +|Set the tunnel route to another endpoint b. Send packets to the route prefix dst | Tunnel route set | Packets are received only at endpoint b | +|Remove the tunnel route. Send packets to the route prefix dst | Tunnel route remove | Packets are not received at any ports with dst IP of b | +|Create tunnel route 1 with two endpoints A = {a1, a2}. Send packets to the route 1's prefix dst | ECMP route create | Packets are received at either a1 or a2 | +|Create tunnel route 2 to endpoint group A Send packets to route 2’s prefix dst | ECMP route create | Packets are received at either a1 or a2 | +|Set tunnel route 2 to endpoint group B = {b1, b2}. Send packets to route 2’s prefix dst | ECMP route set | Packets are received at either b1 or b2 | +|Send packets to route 1’s prefix dst. By removing route 2 from group A, no change expected to route 1 | NHG modify | Packets are received at either a1 or a2 | +|Set tunnel route 2 to single endpoint b1. Send packets to route 2’s prefix dst | NHG modify | Packets are recieved at b1 only | +|Set tunnel route 2 to shared endpoints a1 and b1. Send packets to route 2’s prefix dst | NHG modify | Packets are recieved at a1 or b1 | +|Remove tunnel route 2. Send packets to route 2’s prefix dst | ECMP route remove | Packets are not recieved at any ports with dst IP of a1 or b1 | +|Set tunnel route 3 to endpoint group C = {c1, c2, c3}. Ensure c1, c2, and c3 matches to underlay default route. Send 10000 pkt with random hash to route 3's prefix dst | NHG distribution | Packets are distributed equally across c1, c2 and c3 | +|Modify the underlay default route nexthop/s. Send packets to route 3's prefix dst | Underlay ECMP | No change to packet distribution. Packets are distributed equally across c1, c2 and c3 | +|Remove the underlay default route. | Underlay ECMP | Packets are not recieved at c1, c2 or c3 | +|Re-add the underlay default route. | Underlay ECMP | Packets are equally recieved at c1, c2 or c3 | + +#### BFD and health monitoring + +TBD + +#### BGP advertising + +TBD From a2c68f2b6c40d4b5da164713a988146612ce5cf4 Mon Sep 17 00:00:00 2001 From: Prince Sunny Date: Mon, 1 Nov 2021 18:37:13 -0700 Subject: [PATCH 5/6] Updated for IPv6 test cases --- doc/vxlan/Overlay ECMP with BFD.md | 43 +++++++++++++++++++++++++++--- 1 file changed, 40 insertions(+), 3 deletions(-) diff --git a/doc/vxlan/Overlay ECMP with BFD.md b/doc/vxlan/Overlay ECMP with BFD.md index 24b66f1946a..6ba6ed0c5f8 100644 --- a/doc/vxlan/Overlay ECMP with BFD.md +++ b/doc/vxlan/Overlay ECMP with BFD.md @@ -33,6 +33,7 @@ | 1.0 | 09/13/2021 | Prince Sunny | Revised based on review comments | | 1.1 | 10/08/2021 | Prince Sunny | BFD section seperated | | 1.2 | 10/18/2021 | Prince Sunny/Shi Su | Test Plan added | +| 1.3 | 11/01/2021 | Prince Sunny | IPv6 test cases added | # About this Manual This document provides general information about the Vxlan Overlay ECMP feature implementation in SONiC with BFD support. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/Azure/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) @@ -68,6 +69,7 @@ Below diagram captures the use-case. In this, ToR is a Tier0 device and Leaf is At a high level the following should be supported: - Configure ECMP with Tunnel Nexthops (IPv4 and IPv6) +- Support IPv6 tunnel that can support both IPv4 and IPv6 traffic - Tunnel Endpoint monitoring via BFD - Add/Withdraw Nexthop based on Tunnel or Endpoint health @@ -233,19 +235,39 @@ Create VNET and Vxlan tunnel as an below: ``` {  "VXLAN_TUNNEL": { - "tunnel1": { + "tunnel_v4": { "src_ip": "10.1.0.32" } }, "VNET": { "Vnet_3000": { - "vxlan_tunnel": "tunnel1", + "vxlan_tunnel": "tunnel_v4", "vni": "3000", "scope": "default" }     } ``` +Similarly for IPv6 tunnels + +``` +{  + "VXLAN_TUNNEL": { + "tunnel_v6": { + "src_ip": "fc00:1::32" + } + }, + + "VNET": { + "Vnet_3001": { + "vxlan_tunnel": "tunnel_v6", + "vni": "3001", + "scope": "default" + } +    } +``` + +Note: It can be safely assumed that only one type of tunnel exists - i.e, either IPv4 or IPv6 for this use-case For ```default``` scope, no need to associate interfaces to a VNET @@ -260,11 +282,26 @@ VNET tunnel routes must be created as shown in the example below ] ``` +With IPv6 tunnels, prefixes can be either IPv4 or IPv6 + +``` +[ +    "VNET_ROUTE_TUNNEL_TABLE:Vnet_3001:100.100.2.1/32": {  +        "endpoint": "fc02:1000::1",  + "endpoint_monitor": "fc02:1000::2" +    }, + "VNET_ROUTE_TUNNEL_TABLE:Vnet_3001:20c0:a820:0:80::/64": {  +        "endpoint": "fc02:1001::1",  + "endpoint_monitor": "fc02:1001::2" +    } +] +``` + ### Test Cases #### Overlay ECMP -It is assumed that the endpoint IPs may not have exact match underlay route but may have an LPM underlay route or a default route. +It is assumed that the endpoint IPs may not have exact match underlay route but may have an LPM underlay route or a default route. Test must consider both IPv4 and IPv6 traffic for routes configured as example shown above | Step | Goal | Expected results | |-|-|-| From 48f4cd0d171454e4a383110c84a827ee766e724a Mon Sep 17 00:00:00 2001 From: Prince Sunny Date: Fri, 3 Dec 2021 12:37:22 -0800 Subject: [PATCH 6/6] Added testcases and scaling requirements --- doc/vxlan/Overlay ECMP with BFD.md | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/doc/vxlan/Overlay ECMP with BFD.md b/doc/vxlan/Overlay ECMP with BFD.md index 6ba6ed0c5f8..ef5a4af5e0a 100644 --- a/doc/vxlan/Overlay ECMP with BFD.md +++ b/doc/vxlan/Overlay ECMP with BFD.md @@ -15,6 +15,7 @@ * [1.2 Functional requirements](#12-functional-requirements) * [1.3 CLI requirements](#13-cli-requirements) * [1.4 Warm Restart requirements ](#14-warm-restart-requirements) + * [1.5 Scaling requirements ](#15-scaling-requirements) * [2 Modules Design](#2-modules-design) * [2.1 Config DB](#21-config-db) * [2.2 App DB](#22-app-db) @@ -30,10 +31,11 @@ | Rev | Date | Author | Change Description | |:---:|:-----------:|:------------------:|-----------------------------------| | 0.1 | 09/09/2021 | Prince Sunny | Initial version | -| 1.0 | 09/13/2021 | Prince Sunny | Revised based on review comments | +| 1.0 | 09/13/2021 | Prince Sunny | Revised based on review comments | | 1.1 | 10/08/2021 | Prince Sunny | BFD section seperated | | 1.2 | 10/18/2021 | Prince Sunny/Shi Su | Test Plan added | -| 1.3 | 11/01/2021 | Prince Sunny | IPv6 test cases added | +| 1.3 | 11/01/2021 | Prince Sunny | IPv6 test cases added | +| 1.4 | 12/03/2021 | Prince Sunny | Added scaling section, extra test cases | # About this Manual This document provides general information about the Vxlan Overlay ECMP feature implementation in SONiC with BFD support. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/Azure/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) @@ -80,6 +82,17 @@ At a high level the following should be supported: ## 1.4 Warm Restart requirements No special handling for Warm restart support. +## 1.5 Scaling requirements +At a minimum level, the following are the estimated scale numbers + +| Item | Expected value | +|--------------------------|-----------------------------| +| ECMP groups | 512 | +| ECMP group member | 128 | +| Tunnel (Overlay) routes | 16k | +| Tunnel endpoints | 4k | +| BFD monitoring | 4k | + # 2 Modules Design The following are the schema changes. @@ -308,8 +321,8 @@ It is assumed that the endpoint IPs may not have exact match underlay route but |Create a tunnel route to a single endpoint a. Send packets to the route prefix dst| Tunnel route create | Packets are received only at endpoint a | |Set the tunnel route to another endpoint b. Send packets to the route prefix dst | Tunnel route set | Packets are received only at endpoint b | |Remove the tunnel route. Send packets to the route prefix dst | Tunnel route remove | Packets are not received at any ports with dst IP of b | -|Create tunnel route 1 with two endpoints A = {a1, a2}. Send packets to the route 1's prefix dst | ECMP route create | Packets are received at either a1 or a2 | -|Create tunnel route 2 to endpoint group A Send packets to route 2’s prefix dst | ECMP route create | Packets are received at either a1 or a2 | +|Create tunnel route 1 with two endpoints A = {a1, a2}. Send multiple packets (varying tuple) to the route 1's prefix dst. | ECMP route create | Packets are received at both a1 and a2 | +|Create tunnel route 2 to endpoint group A Send multiple packets (varying tuple) to route 2’s prefix dst | ECMP route create | Packets are received at both a1 and a2 | |Set tunnel route 2 to endpoint group B = {b1, b2}. Send packets to route 2’s prefix dst | ECMP route set | Packets are received at either b1 or b2 | |Send packets to route 1’s prefix dst. By removing route 2 from group A, no change expected to route 1 | NHG modify | Packets are received at either a1 or a2 | |Set tunnel route 2 to single endpoint b1. Send packets to route 2’s prefix dst | NHG modify | Packets are recieved at b1 only | @@ -319,6 +332,15 @@ It is assumed that the endpoint IPs may not have exact match underlay route but |Modify the underlay default route nexthop/s. Send packets to route 3's prefix dst | Underlay ECMP | No change to packet distribution. Packets are distributed equally across c1, c2 and c3 | |Remove the underlay default route. | Underlay ECMP | Packets are not recieved at c1, c2 or c3 | |Re-add the underlay default route. | Underlay ECMP | Packets are equally recieved at c1, c2 or c3 | +|Bring down one of the port-channels. | Underlay ECMP | Packets are equally recieved at c1, c2 or c3 | +|Create a more specific underlay route to c1. | Underlay ECMP | Verify c1 packets are received only on the c1's nexthop interface | +|Create tunnel route 4 to endpoint group A Send packets (fixed tuple) to route 4’s prefix dst | Vxlan Entropy | Verify Vxlan entropy| +|Change the udp src port of original packet to route 4’s prefix dst | Vxlan Entropy | Verify Vxlan entropy is changed| +|Change the udp dst port of original packet to route 4’s prefix dst | Vxlan Entropy | Verify Vxlan entropy is changed| +|Change the src ip of original packet to route 4’s prefix dst | Vxlan Entropy | Verify Vxlan entropy is changed| +|Create/Delete overlay routes to 16k with unique endpoints upto 4k | CRM | Verify crm resourse for route (ipv4/ipv6) and nexthop (ipv4/ipv6) | +|Create/Delete overlay nexthop groups upto 512 | CRM | Verify crm resourse for nexthop_group | +|Create/Delete overlay nexthop group members upto 128 | CRM | Verify crm resourse for nexthop_group_member | #### BFD and health monitoring