diff --git a/.gitignore b/.gitignore index 2543955bc67..b0c902d5764 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,8 @@ *~ *.swp *.un~ + + + +.DS_Store + diff --git a/doc/distributed-data-plane-verification/coral.md b/doc/distributed-data-plane-verification/coral.md index 0d016c3be23..3f2c308a90e 100644 --- a/doc/distributed-data-plane-verification/coral.md +++ b/doc/distributed-data-plane-verification/coral.md @@ -3,21 +3,20 @@ #### Rev 0.1 # Table of Contents -todo +TBD in Markdown with links # Revision | Rev | Date | Author | Change Description | -| :---: | :---: | :----- | :----------------- | -| | | | | +| ---- | ---------- | -----------| ------------------| +| v0.1 | 02/24/2022 | Qiao Xiang, Chenyang Huang, Ridi Wen, Jiwu Shu@Xiamen University, China| Initial version | -# About this Manual -This document describes the design details of distributed data plane verification feature. - -TODO # Scope -This document describes the high level design details about how distributed data plane verification works. +This document describes the high-level design of the distributed data plane verification feature. + + + # Definitions/Abbreviations ###### Table 1: Abbreviations @@ -28,36 +27,59 @@ This document describes the high level design details about how distributed data | LEC | Local Equivalence Class | | DPV | Data plane verification | -# 1 Motivation +# Overview Network errors such as forwarding loops, undesired blackholes and waypoint violations are the outcome of various issues (e.g., software bugs, faulty hardware, protocol misconfigurations, and oversight negligence). They can happen in all types of networks (e.g., enterprise networks, wide area networks and data center networks), and have both disastrous financial and social consequences. Data plane verification (DPV) is important for finding such network errors. Current DPV tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. -Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable. +Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable because (1) it requires a highly available management network, which is hard build itself; and (2) the server becomes the performance bottleneck and a single point of failure. + +In this HLD, to tackle the scalability challenge of DPV, we propose a distributed +data plane verification feature, which circumvents the scalability bottleneck +of centralized design and performs data plane checking on commodity network +devices. Our key insight is that DPV can be transformed into a counting problem +on a directed acyclic graph, which can be naturally decomposed into lightweight +tasks executed at network devices, enabling scalability. To be concrete, this +feature provides: + +* A declarative requirement specification language that allows operators to + flexibly specify common requirements studied by existing DPV tools (e.g., +reachability, blackhole free, waypoint and isolation), and more advanced, yet +understudied requirements (e.g., multicast, anycast, no-redundant-delivery and +all-shortest-path availability). +* A verification planner that takes the operator specified requirement as input, + and systematically decomposes the global verification into lightweight, +on-device computation tasks. +* A distributed verification (DV) protocol that specifies how on-device + verifiers communicate task results efficiently to collaboratively verify the +requirements. -In order to tackle the scalability challenge of DPV, distributed data plane verification is designed, a generic, distributed, -on-device DPV framework, which circumvents the scalability bottleneck of centralized design. The key insight is as follows. A -directed acyclic graph (DAG), which represents all valid paths in -the network, is called DVNet. The problem of DPV can be transformed into a counting problem in DVNet; the latter can then be decomposed into small tasks at nodes on the DVNet, which can be distributively executed at corresponding network devices, enabling scalability. -The picture below demonstrates the architechture and workflow of distributed data plane verification. + +The picture below demonstrates the architecture and workflow of distributed data plane verification. + + ![architecture](img/architecture.png) Firstly, DVNet is generated based on specified verification requirement and actual network topology. Then, the counting problem is distributed to individual switches. On each switch, counting result is computed depending on received verification messages and delivered to corresponding upstream node on DVNet. Finally, the source switch would be able to determine whether there is an error on data plane according to received verification messages. -The demo of distributed data plane verification can be found at [distributeddpvdemo.tech](distributeddpvdemo.tech). + +A series of demos of the proposed feature can be found at [distributeddpvdemo.tech](distributeddpvdemo.tech). All demos are conducted on a small testbed of commodity switches installed with SONiC or ONL. + +# Requirements +* The ddpv container needs to have access to the device data plane (i.e., FIB and ACL) stored in the database container. +* The ddpv container at neighboring switches needs to use sockets to exchange verification messages. +* The ddpv container will be developed in Java, and will need a Java Runtime Environment (JRE). +* New CLI commands need to be added to allow the ddpv container to receive the counting tasks from the verification planner and show related information, e.g., verification results, counting numbers, and status. + # 2 Overview -## 2.1 Functionality Overview +# 2 Functionality Overview 1. Distributed data plane verification allows user to verify a wide range of requirements, e.g., reachability, isolation, loop-freeness, black hole freeness, waypoint reachability and all shortest-path availability requirement in Azure RCDC [1]. 2. Distributed data plane verification is able to verify data plane in the scenario of both burst update and incremental update. -## 2.2 Requirements Overview -1. In order to compute Local Equivalence Class (LEC) table, distributed data plane verification needs to have access to FIB stored in database container. -2. Sockets can be built to receive and deliver verification messages containing counting result. -3. Java Runtime Environment (JRE) is needed in operation system. -4. New CLI commands need to be added to specify data plane verification requirements and show related information, e.g., verification results, counting numbers, and status. + # 3 Functionality ## 3.1 Functionality Description @@ -200,4 +222,4 @@ The process is similar to green start. Finally, update is complete until each de Ashish Bhargava, Paul-Andre C Bissonnette, Shane Foster, Andrew Helwer, Mark Kasten, Ivan Lee, et al. 2019. Validating Datacenters at Scale. In Proceedings of the ACM Special Interest Group on Data -Communication. 200–213. \ No newline at end of file +Communication. 200–213.