From 040434330d8697a6acd17828bc9df7300d1ff5e1 Mon Sep 17 00:00:00 2001 From: Qiao Xiang Date: Fri, 25 Feb 2022 00:27:00 +0800 Subject: [PATCH 1/2] checkin small edits --- .gitignore | 5 +++ .../coral.md | 34 ++++++++++--------- 2 files changed, 23 insertions(+), 16 deletions(-) diff --git a/.gitignore b/.gitignore index 2543955bc67..b0c902d5764 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,8 @@ *~ *.swp *.un~ + + + +.DS_Store + diff --git a/doc/distributed-data-plane-verification/coral.md b/doc/distributed-data-plane-verification/coral.md index d8bc7e163ba..188afcadbe7 100644 --- a/doc/distributed-data-plane-verification/coral.md +++ b/doc/distributed-data-plane-verification/coral.md @@ -3,21 +3,20 @@ #### Rev 0.1 # Table of Contents -todo +TBD in Markdown with links # Revision | Rev | Date | Author | Change Description | -| :---: | :---: | :----- | :----------------- | -| | | | | +| ---- | ---------- | -----------| ------------------| +| v0.1 | 02/24/2022 | Qiao Xiang, Chenyang Huang, Ridi Wen, Jiwu Shu| Initial version | -# About this Manual -This document describes the design details of distributed data plane verification feature. - -TODO # Scope -This document describes the high level design details about how distributed data plane verification works. +This document describes the high-level design of the distributed data plane verification feature. + + + # Definitions/Abbreviations ###### Table 1: Abbreviations @@ -28,21 +27,24 @@ This document describes the high level design details about how distributed data | LEC | Local Equivalence Class | | DPV | Data plane verification | -# 1 Motivation +# Overview Network errors such as forwarding loops, undesired blackholes and waypoint violations are the outcome of various issues (e.g., software bugs, faulty hardware, protocol misconfigurations, and oversight negligence). They can happen in all types of networks (e.g., enterprise networks, wide area networks and data center networks), and have both disastrous financial and social consequences. Data plane verification (DPV) is important for finding such network errors. Current DPV tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. -Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable. +Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable because (1) it requires a highly available management network, which is hard build itself; and (2) the server becomes the performance bottleneck and a single point of failure. + +In this HLD, to tackle the scalability challenge of DPV, we propose a distributed +data plane verification framework, which circumvents the scalability bottleneck +of centralized design and performs data plane checking on commodity network +devices. Our key insight is that DPV can be transformed into a counting problem +on a directed acyclic graph, which can be naturally decomposed into lightweight +tasks executed at network devices, enabling scalability. -In order to tackle the scalability challenge of DPV, distributed data plane verification is designed, a generic, distributed, -on-device DPV framework, which circumvents the scalability bottleneck of centralized design. The key insight is as follows. A -directed acyclic graph (DAG), which represents all valid paths in -the network, is called DVNet. The problem of DPV can be transformed into a counting problem in DVNet; the latter can then be decomposed into small tasks at nodes on the DVNet, which can be distributively executed at corresponding network devices, enabling scalability. -The picture below demonstrates the architechture and workflow of distributed data plane verification. +The picture below demonstrates the architecture and workflow of distributed data plane verification. ![architecture](img/architecture.png) Firstly, DVNet is generated based on specified verification requirement and actual network topology. Then, the counting problem is distributed to individual switches. On each switch, counting result is computed depending on received verification messages and delivered to corresponding upstream node on DVNet. Finally, the source switch would be able to determine whether there is an error on data plane according to received verification messages. @@ -173,4 +175,4 @@ to allow distributed data plane verification planners on the device to verify th Ashish Bhargava, Paul-Andre C Bissonnette, Shane Foster, Andrew Helwer, Mark Kasten, Ivan Lee, et al. 2019. Validating Datacenters at Scale. In Proceedings of the ACM Special Interest Group on Data -Communication. 200–213. \ No newline at end of file +Communication. 200–213. From 50122b7a7723c4e8df42422fb7ae7cac0011cc48 Mon Sep 17 00:00:00 2001 From: Qiao Xiang Date: Fri, 25 Feb 2022 00:48:32 +0800 Subject: [PATCH 2/2] small edits --- .../coral.md | 40 ++++++++++++++----- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/doc/distributed-data-plane-verification/coral.md b/doc/distributed-data-plane-verification/coral.md index 188afcadbe7..9078d34433a 100644 --- a/doc/distributed-data-plane-verification/coral.md +++ b/doc/distributed-data-plane-verification/coral.md @@ -9,7 +9,7 @@ TBD in Markdown with links | Rev | Date | Author | Change Description | | ---- | ---------- | -----------| ------------------| -| v0.1 | 02/24/2022 | Qiao Xiang, Chenyang Huang, Ridi Wen, Jiwu Shu| Initial version | +| v0.1 | 02/24/2022 | Qiao Xiang, Chenyang Huang, Ridi Wen, Jiwu Shu@Xiamen University, China| Initial version | # Scope @@ -37,29 +37,49 @@ Current DPV tools employ a centralized architecture, where a server collects the Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable because (1) it requires a highly available management network, which is hard build itself; and (2) the server becomes the performance bottleneck and a single point of failure. In this HLD, to tackle the scalability challenge of DPV, we propose a distributed -data plane verification framework, which circumvents the scalability bottleneck +data plane verification feature, which circumvents the scalability bottleneck of centralized design and performs data plane checking on commodity network devices. Our key insight is that DPV can be transformed into a counting problem on a directed acyclic graph, which can be naturally decomposed into lightweight -tasks executed at network devices, enabling scalability. +tasks executed at network devices, enabling scalability. To be concrete, this +feature provides: + +* A declarative requirement specification language that allows operators to + flexibly specify common requirements studied by existing DPV tools (e.g., +reachability, blackhole free, waypoint and isolation), and more advanced, yet +understudied requirements (e.g., multicast, anycast, no-redundant-delivery and +all-shortest-path availability). +* A verification planner that takes the operator specified requirement as input, + and systematically decomposes the global verification into lightweight, +on-device computation tasks. +* A distributed verification (DV) protocol that specifies how on-device + verifiers communicate task results efficiently to collaboratively verify the +requirements. + The picture below demonstrates the architecture and workflow of distributed data plane verification. + + ![architecture](img/architecture.png) Firstly, DVNet is generated based on specified verification requirement and actual network topology. Then, the counting problem is distributed to individual switches. On each switch, counting result is computed depending on received verification messages and delivered to corresponding upstream node on DVNet. Finally, the source switch would be able to determine whether there is an error on data plane according to received verification messages. -The demo of distributed data plane verification can be found at [distributeddpvdemo.tech](distributeddpvdemo.tech). + +A series of demos of the proposed feature can be found at [distributeddpvdemo.tech](distributeddpvdemo.tech). All demos are conducted on a small testbed of commodity switches installed with SONiC or ONL. + +# Requirements +* The ddpv container needs to have access to the device data plane (i.e., FIB and ACL) stored in the database container. +* The ddpv container at neighboring switches needs to use sockets to exchange verification messages. +* The ddpv container will be developed in Java, and will need a Java Runtime Environment (JRE). +* New CLI commands need to be added to allow the ddpv container to receive the counting tasks from the verification planner and show related information, e.g., verification results, counting numbers, and status. + # 2 Overview -## 2.1 Functionality Overview +# 2 Functionality Overview 1. Distributed data plane verification allows user to verify a wide range of requirements, e.g., reachability, isolation, loop-freeness, black hole freeness, waypoint reachability and all shortest-path availability requirement in Azure RCDC [1]. 2. Distributed data plane verification is able to verify data plane in the scenario of both burst update and incremental update. -## 2.2 Requirements Overview -1. In order to compute Local Equivalence Class (LEC) table, distributed data plane verification needs to have access to FIB stored in database container. -2. Sockets can be built to receive and deliver verification messages containing counting result. -3. Java Runtime Environment (JRE) is needed in operation system. -4. New CLI commands need to be added to specify data plane verification requirements and show related information, e.g., verification results, counting numbers, and status. + # 3 Functionality ## 3.1 Functionality Description