Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Data Plane Verification HLD #948

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
833e0a1
add coral doc
Mauue Feb 24, 2022
51a3b77
add motivation&overview
yuxxinwang Feb 24, 2022
e8ea8b8
func
RiDiWe Feb 24, 2022
6ce9857
update func
RiDiWe Feb 24, 2022
c160761
update
RiDiWe Feb 24, 2022
ee61c22
update
RiDiWe Feb 24, 2022
34fdf05
update 1&2
yuxxinwang Feb 24, 2022
7c8dea6
update
RiDiWe Feb 24, 2022
8398471
Merge remote-tracking branch 'origin/master'
RiDiWe Feb 24, 2022
084061a
update
RiDiWe Feb 24, 2022
332855e
update
RiDiWe Feb 24, 2022
9d5251f
update
RiDiWe Feb 24, 2022
35b8b9d
update design
Mauue Feb 24, 2022
d235f7d
Merge remote-tracking branch 'origin/master'
Mauue Feb 24, 2022
cdc2488
update
RiDiWe Feb 24, 2022
88f2774
Merge remote-tracking branch 'origin/master'
RiDiWe Feb 24, 2022
1901bdd
rename folder
xiangq27 Feb 24, 2022
0404343
checkin small edits
xiangq27 Feb 24, 2022
50122b7
small edits
xiangq27 Feb 24, 2022
b61eda6
update design
Mauue Feb 24, 2022
6488ad2
Merge remote-tracking branch 'origin/master'
Mauue Feb 24, 2022
39ff984
chkpt
xiangq27 Feb 24, 2022
2456c9a
Merge branch 'master' of github.com:sngroup-xmu/SONiC into master
xiangq27 Feb 24, 2022
8dcad89
update
Mauue Feb 24, 2022
6690f38
Merge remote-tracking branch 'origin/master'
Mauue Feb 24, 2022
29f1f50
chkin
xiangq27 Feb 24, 2022
ae9549c
fix conflict
xiangq27 Feb 24, 2022
d3224b9
chkin
xiangq27 Feb 24, 2022
5c3fa11
update image
Mauue Feb 24, 2022
822d510
fix image
Mauue Feb 24, 2022
a958422
small update on developer list
xiangq27 Feb 24, 2022
5b56741
Merge branch 'master' of github.com:sngroup-xmu/SONiC into master
xiangq27 Feb 24, 2022
82038e4
update design
Mauue Feb 27, 2022
4b85092
Merge remote-tracking branch 'origin/master'
Mauue Feb 27, 2022
91fb97a
fix
Mauue Feb 27, 2022
bceebdb
update lecbuilderd
xiangq27 Feb 27, 2022
790c037
fix
Mauue Feb 27, 2022
48396bf
update the design details
xiangq27 Feb 27, 2022
34918d8
fix
Mauue Feb 27, 2022
5b02c81
update file name
xiangq27 Feb 27, 2022
45a23e6
add technical report link
xiangq27 Jun 10, 2022
029d404
v0.2 of ddpv feature
yuxxinwang Sep 9, 2022
72e99d2
Merge branch 'master' into master
xiangq27 Sep 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
*~
*.swp
*.un~



.DS_Store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be included


276 changes: 276 additions & 0 deletions doc/distributed-data-plane-verification/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,276 @@
# Distributed Data Plane Verification
# High Level Design Document
#### Rev 0.1

# Table of Contents
TBD in Markdown with links

# Revision

| Rev | Date | Author | Change Description |
| ---- | ---------- | -----------| ------------------|
| v0.1 | 02/24/2022 | Qiao Xiang, Chenyang Huang, Ridi Wen, Yuxin Wang, Jiwu Shu@Xiamen University, China| Initial version |


# Scope
This document describes the high-level design of the distributed data plane verification feature.




# Definitions/Abbreviations
###### Table 1: Abbreviations
| Abbreviation | Full form |
| ------------ | --------------------------- |
| FIB | Forwarding Information Base |
| CLI | Command Line Interface |
| LEC | Local Equivalence Class |
| DPV | Data plane verification |

# Overview

Network errors such as forwarding loops, undesired blackholes and waypoint violations are the outcome of various
issues (e.g., software bugs, faulty hardware, protocol misconfigurations, and oversight negligence). They can happen in
all types of networks (e.g., enterprise networks, wide area networks and data center networks), and have both disastrous
financial and social consequences. Data plane verification (DPV) is important for finding such network errors.
Current DPV tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give some references to "Current DPV tools"

Despite substantial efforts on accelerating DPV, this centralized architecture is inherently unscalable because (1) it requires a highly available management network, which is hard build itself; and (2) the server becomes the performance bottleneck and a single point of failure.

In this HLD, to tackle the scalability challenge of DPV, we propose a distributed
data plane verification feature, which circumvents the scalability bottleneck
of centralized design and performs data plane checking on commodity network
devices. Our key insight is that DPV can be transformed into a counting problem
on a directed acyclic graph called DVNet, which can be naturally decomposed into lightweight
tasks executed at network devices, enabling scalability. To be concrete, this
feature provides:

* A declarative requirement specification language that allows operators to
flexibly specify common requirements studied by existing DPV tools (e.g.,
reachability, blackhole free, waypoint and isolation), and more advanced, yet
understudied requirements (e.g., multicast, anycast, no-redundant-delivery and
all-shortest-path availability).
* A verification planner that takes the operator specified requirement as input,
and systematically decomposes the global verification into lightweight,
on-device computation tasks.
* A distributed verification (DV) protocol that specifies how on-device
verifiers communicate task results efficiently to collaboratively verify the
requirements.



The picture below demonstrates the architecture and workflow of distributed data plane verification.


![system](img/system-diagram.jpg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system-diagram.jpg should give more details. For example, what container to add? How do new containers interact with existing components?



A series of demos of the proposed feature can be found at [distributeddpvdemo.tech](DDPV-Demos). All demos are conducted on a small testbed of commodity switches installed with SONiC or ONL. A technical report of the detailed design of DDPV can be found at [https://arxiv.org/abs/2205.07808](arXiv).

# Requirements
* The ddpv container needs to have access to the device data plane (i.e., FIB and ACL) stored in the database container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to mention names of concrete tables that store those information

* The ddpv container at neighboring switches needs to use sockets to exchange verification messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you assume that all the switches run SONiC with ddpv container?

* The ddpv container will be developed in Java, and will need a Java Runtime Environment (JRE).
* New CLI commands need to be added to allow the ddpv container to receive the counting tasks from the verification planner and show related information, e.g., verification results, counting numbers, and status.





# Functionality
## Functionality Description
Distributed data plane verification detects a wide range of network errors (e.g., switch operating system errors) by checking the actual
data plane on the network device, so that the operator can detect the network error in time, take relevant
measures to correct the error, and reduce the loss as much as possible. Distributed data plane verification can efficiently validate a
wide range of network forwarding requirements of operators in different scenarios. If the current network
status does not meet the operator's network forwarding requirements, then prompt the operator network error.
Distributed data plane verification generates a directed acyclic graph called DVnet based on the network topology and requirements,
and performs a reverse counting process on DVnet, finally determines whether the network is wrong.

## Use Case Examples

We use two examples to demonstrates how the DDPV feature works. More
illustrations can be found at [distributeddpvdemo.tech](DDPV-Demos). The first
example is in a network in Figure 2.

<img src="img/tore.png" width="50%" alt="tore" />

Figure 2. An example topology and requirement.

After the operator specifies the requirement in Figure 2, the verification
planner decides the on-device tasks for each device in the network by
constructing a data structure called DVNet. Informally, DVNet is a DAG that
compactly represents all valid paths in the topology that satisfy an
operator-specified requirement, and is independent of the actual data plane of
the network. Figure 3 gives the computed DVNet of the example in Figure 2.

<img src="./img/dvnet.png" width="50%" alt="dvnet" />


Figure 3. The DVNet and the counting process.


Note the devices in the network and the nodes in DVNet have a
1-to-many mapping. For each node u in DVNet, we assign a unique identifier, which is a concatenation of u.dev and an integer.
For example, device W in the network is mapped to two nodes B1 and B2 in DVNet, because the regular expression
allows packets to reach D via [B,W,D] or [W,B,D].

### Example 1-1: Green Start

<img src="./img/dataplane.png" width="30%" alt="dataplane" />


Figure 4. The network data plane.


We first show how DDPV works in the scenario of green start, i.e., all forwarding rules are
installed to corresponding switches all at once. Consider the network data plane
in Figure 4. For simplicity, we use P1, P2, P3 to represent the packet spaces with destination IP
prefixes of 10.0.0.0/23, 10.0.0.0/24, and 10.0.1.0/24, respectively. Note that P2 ∩ P3 = ∅ and P1 = P2 ∪ P3. Each u in DVNet
initializes a packet space → count mapping, (P1, 0), except for D1 that initializes the mapping as (P1, 1) (i.e., one copy of
any packet in P1 will be sent to the correct external ports). Afterwards, we traverse all the nodes in DVNet in reverse topological
order to update their mappings. Each node u checks the data plane of u.dev to find the set of next-hop devices
u.dev will forward P1 to. If the action of forwarding to this next-hop set is of ALL-type, the mapping at u can be updated by adding up the
count of all downstream neighbors of u whose corresponding device belongs to the set of next-hops of u.dev for forwarding P1. For example,
node W1 updates its mapping to (P1, 1) and node W2 updates its mapping to (P1, 1) because device W forwards to D, but node B1’s mapping
is still (P1, 0) because B does not forward P1 to W. Similarly, although W1 has two downstream neighbors B2 and D1, each with an updated
mapping (P1, 1). At its turn, we update its mapping to (P1, 1) instead of (P1, 2), because device W only forwards P1 to D, not B.

Consider the mapping update at A1. A would forward P2 to either B or W. A forwards P2 to B, the mapping at A1 is (P2, 0), because
B1’s updated mapping is (P1, 0) and P2 ⊂ P1. A forwards P2 to W , the mapping at A1 is (P2, 1) because W1’s updated mapping is (P1, 1).
Therefore, the updated mapping for P2 at A1 is (P2, [0, 1]). In the end, the updated mapping of S1 [(P2, [0, 1]), (P3, 1)] reflects the final
counting results, indicating that the data plane in Figure 3 does not satisfy the requirements in Figure 2. In other words, the network
data plane is erroneous.
### Example 1-2: Incremental Update
Consider another scenario in Figure 2, where B updates its data plane to forward P1 to W , instead of to D. The changed mappings of different
nodes are circled with boxes in Figure 4. In this case, device B locally updates the task results of B1 and B2 to [(P1, 1)] and [(P1, 0)],
respectively, and sends corresponding updates to the devices of their upstream neighbors, i.e., [(P1, 1)] sent to A following the opposite
of (A1, B1) and [(P1, 0)] sent to W following the opposite of (W 1, B2).

Upon receiving the update, W does not need to update its mapping for node W1,
because W does not forward any packet to B. As such, W does not need to send any update to A along the opposite of (A1,W1). In contrast,
A needs to update its task result for node A1 to [(P1, 1)] because (1) no matter whether A forwards packets in P2 to B or W , 1 copy of
each packet will be sent to D, and (2) P2 ∪ P3 = P1. After
updating its local result, A sends the update to S along the opposite of (S1,A1). Finally, S updates its local result for S1 to [(P1, 1)],
i.e., the requirement is satisfied after the update.

### Example 2: Verifying RCDC Local Contracts
In the second example, we show how DDPV verifies the local contracts of the
all-shortest-path availability in Azure RCDC [1]. All-shortest-path availability
requires all pairs of ToR devices in a Clos-based data center should reach each
other along a shortest path, and all ToR-to-ToR shortest paths should be
available in the data plane.

![system](img/dc.png)

Figure 5: An example datacenter.

We first explain what ToR contracts are using the example in Figure 5, we show that
RCDC is a special case of DDPV.. Each ToR has a default contract with next
hops set to its neighboring leaf devices. For example, the default
contract for ToR1 specifies {A1,A2,A3,A4} as the next hops.
Each ToR has a specific contract for every prefix hosted in the
datacenter besides the prefix that it is configured to announce, and
the next hops are set to its neighboring leaf devices. For example,
ToR1 has specific contracts for PrefixB, PrefixC
, and PrefixD with next hops set to {A1,A2,A3,A4}.
Aggregation contracts and core contracts are similar to ToR contracts.


![system](img/rcdc_contracts.png)

Figure 6: Example illustrating local contracts.

We select three devices (one edge like ToR1, one aggregation like A1 and one
core like D1) in a 48-ary Fattree and another operational Clos-based topology
called NGClos, respectively, and verify their local contracts on three
commodity switches. Figure 7 shows that all local contracts are verified on
commodity switches in less than 320ms, with a CPU load (i.e., CPU time /(total
time * number of cores)) ≤ 0.47 and a maximal memory ≤ 15.2MB. These results
show that it is feasible to run DDPV on commodity devices to verify local
contracts of data centers.

<img src="./img/dc_total_time.png" width="484px" alt="dc_total_time" />


(a) Total time.

<img src="./img/dc_memory.png" width="484px" alt="dc_memory" />


(b) Maximal memory.

<img src="./img/dc_load.png" width="484px" alt="dc_load" />


(c) CPU load.

Figure 7: Time and overhead of verifying all-shortest-path availability in DC networks from green start on commodity network devices.




# Design

The ddpv container runs two daemon processes: lecbuilderd and vagentd. We give a
brief view of the key classes and the main functionalities of each process.

## lecbuilderd
The core of lecbuilderd is the LECBuilder class.

- LECBuilder

This class is responsible for converting the data plane (e.g., FIB and ACL) of the residing device to local equivalence classes (LECs).
Given a device X, a local equivalence class is a set of packets whose actions are identical at X.
LECBuilder stores the LECs of its residing device using a data structure called binary decision diagram (BDD).
The main methods in LECBuilder are:

- `buildLEC()`: read the database container to get the data plane of the device,
and build the LECs.
- `updateLEC()`: get the updates of data plane from the database container and update the LECs incrementally.

## vagentd
The vagentd process uses a dispatch-worker thread pool design.

### Dispatcher

The Dispatcher class receives the computation task configurations from the planner and spawns
Worker threads correspondingly. It also establishes socket connections with neighboring devices, dispatches received messages to corresponding Worker threads and sends the messages from Worker threads to corresponding neighbor devices.
The main methods in Dispatcher include:

- `receiveInstruction()`: receive instructions on computation tasks from the planner, and spawn corresponding Worker threads. This method is only invoked when the system starts or the planner updates the computation tasks based on operators' instructions.
- `receiveMessage()`: receive the verification messages from neighboring devices and dispatch them to the corresponding workers.
- `receiveLECUpdate()`: receive the updates of LECs from lecbuilderd and dispatch them to corresponding workers.
- `sendMessage()`: receive the sendResult requests
from workers and send them to corresponding devices.
- `sendAlert()`: if a worker specifies in its sendResult request that the result indicates a violation of an
operator-specified requirement, the dispatcher sends an alert to the operators.


### Worker

This class executes the lightweight computation tasks specified by the planner.
Each node in DVNet corresponds to a worker thread.
The running state of workers is controlled by the thread pool.
The main methods in Worker include:
- `receiveMessage()`: receive the verification message from the dispatcher, and execute the computation task incrementally.
- `receiveLECUpdate()`: receive the updates of LECs from the dispatcher, and execute the computation task incrementally.
- `sendResult()`: send the result of the computation task to the dispatcher, which either forwards it to a corresponding neighbor device or sends an alert to the operators, depending on whether a violation of an operator-specified requirement is found by the worker.




# References
[1] Karthick Jayaraman, Nikolaj Bjørner, Jitu Padhye, Amar Agrawal,
Ashish Bhargava, Paul-Andre C Bissonnette, Shane Foster, Andrew
Helwer, Mark Kasten, Ivan Lee, et al. 2019. Validating Datacenters
at Scale. In Proceedings of the ACM Special Interest Group on Data
Communication. 200–213.

[2] Qiao Xiang, Ridi Wen, Chenyang Huang, Jiwu Shu, Franck Le, 2022. Switch as a
Verifier: Toward Scalable Data Plane Checking via Distributed, On-Device
Verification. https://arxiv.org/abs/2205.07808





Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.