Skip to content

Latest commit

 

History

History
304 lines (216 loc) · 12.1 KB

xdp-project.org

File metadata and controls

304 lines (216 loc) · 12.1 KB

Top-level XDP project management

This the top-level XDP project management file that contains tasks via org-mode TODO, NEXT and DONE. The areas directory also contains tasks. It is recommended to use emacs when viewing and editing these .org files, as the github rendering view removes the TODO and DONE marking on the tasks.

Together with the emacs setup in ../org-setup.el, the extended org-agenda view can be used for project management, and the setup automatically detect Projects and Stuck Projects. For org-agenda view to pickup these tasks, make sure to configure org-agenda-files to include this directory and areas/ directory.

Consistency for statistics with XDP

The short of it is that we need consistency in the counters across NIC drivers and virtual devices. Right now stats are specific to a driver with no clear accounting for the packets and bytes handled in XDP.

Progress: @dsahern (David Ahern) have started an email thread on the subject: consistency for statistics with XDP mode

Missing update of ifconfig counters

Some drivers are not updating the “ifconfig” stats counters, when in XDP mode. This makes receive or send via XDP invisible to sysadm/management tools. This for-sure is going to cause confusion.

Closer look at other drivers.

  • ixgbe driver is doing the right thing.
  • i40e had a bug, where RX/TX stats are swapped (fixed in commit cdec2141c24e(v4.20-rc1) (“i40e: report correct statistics when XDP is enabled”)).
  • mlx5 driver is not updating the regular RX/TX counters, but A LOT of other ethtool stats counters (which are the ones I usually monitor when testing).

NEXT Figure out the counter semantics upstream

Need to have an upstream discussion, on what is the semantic. IHMO the regular RX/TX counters must be updated even for XDP frames.

Statistics per XDP-action

Accounting per XDP-action is also inconsistent across drivers. Some driver account nothing, while others have elaborate counters exposed as ethtool stats.

The common pattern is that XDP programs do their own accounting of action as they see fit, and export this as BPF maps to their associated userspace application, which a very per application specific approach.

David Ahern (@dsahern) argues that sysadm’s need something more generic and consistent, as they need a way to diagnose the system, regardless of the XDP program that some developer asked to get installed. This argument that a 3rd person should be able to diagnose a running XDP program was generally accepted upstream.

There were a lot of performance concerns around introducing XDP-action counters. Generally upstream voices wanted this to be opt-in, e.g. something that the sysadm explicitly enable when needed, and not default on.

NEXT Implement simple XDP-actions counter and measure

As Saeed suggested do something really simple and central like:

+++ b/include/linux/filter.h
@@ -651,7 +651,9 @@ static __always_inline u32 bpf_prog_run_xdp(const
struct bpf_prog *prog,
         * already takes rcu_read_lock() when fetching the program, so
         * it's not necessary here anymore.
         */
-       return BPF_PROG_RUN(prog, xdp);
+       u32 ret = BPF_PROG_RUN(prog, xdp);
+       xdp->xdp_rxq_info.stats[ret]++
+       return ret;
 }

WARNING: Realised above code is subject to speculative execution side-channel leaking.

And measure if the performance concerns are real or not. Ilias or Jesper can measure this on a slow ARM64 platform (espressobin), as only testing this on Intel might lead to the wrong conclusion.

Exporting XDP-actions counter to userspace

Collecting XDP-actions counter is only the first step. We need to figure out the best solution for exporting this to userspace.

One option is to piggyback on ethtool stats, which the drivers already use, and it would also standardise the driver related XDP ethool stats.

Expanding XDP_REDIRECT support to more drivers

Very few drivers support XDP_REDIRECT.

HW drivers: ixgbe, i40e, mlx5

SW drivers: veth, tun (tuntap), virtio_net

NEXT Map out redirect support in drivers and make a plan

It would be useful with a complete list drivers that are missing some or all parts of REDIRECT support. Based on this we can make a plan for what to do about each of these.

Better ndo_xdp_xmit resource management

Driver resources needed to handle a ndo_xdp_xmit() is currently tied to the driver having loaded an RX XDP program. This is strange, as allocating these Driver TX HW resources is independent.

This can quickly lead to exhausting HW resources, like IRQs lines or NIC TX HW queues, given it is assumed a TX queue is alloc/dedicated for each CPU core.

NEXT Change non-map xdp_redirect helper to use a hidden map

To be able to tie resource allocation to the interface maps (devmap), we first need to change the non-map redirect variant so it uses a map under the hood. Since xdp_redirect_map() is also significantly faster than the non-map variant, this change should be a win in itself.

Ethtool interface for enabling TX resources

Turns out the initial idea of using insertion into devmap as a trigger for resource allocation doesn’t work because of generic XDP. So we’ll need an ethtool interface; look into the existing channel configuration interface on the kernel side and figure out how to express XDP resource allocation in a good way.

Add automatic TX resource allocation to libbpf

Because we can’t tie resource allocation to map insertion on the kernel side, we need to solve the UI interface in userspace. So add a hook/wrapper to libbpf that will automatically allocate TX resources when inserting into a map.

Usability of programs in samples/bpf

The samples/bpf programs xdp_redirect + xdp_redirect_map are very user unfriendly. #1 they use raw ifindex’es as input + output. #2 the pkt/s number count RX packets, not TX’ed packets which can be dropped silently. Red Hat QA, got very confused by #2.

NEXT Change sample programs to accept ifnames as well as indexes

NEXT Add TX counters to redirect samples/bpf programs

Simply include/sample the net_device TX stats.

Fix unloading wrong XDP on xdp-sample exit

The XDP sample programs unconditionally unload the current running XDP program (via -1) on exit. If users are not careful with the order in-which they start and stop XDP programs, then they get confused.

Almost done, but followup to make sure this gets merged upstream: Upstream proposal V1 (by Maciej Fijalkowski) is to check if the BPF-prog-ID numbers match, before removing the current XDP-prog.

Change XDP-samples to enforce native-XDP and report if not avail

The default behaviour when attaching an XDP program on a driver that doesn’t have native-XDP is to fallback to generic-XDP, without notifying the user of the end-state.

This behaviour is also used by xdp-samples, which unfortunately have lead end-users to falsely think a given driver supports native-XDP. (QA are using these xdp-samples and create cases due to this confusion).

Proposal is to change xdp-samples to enforce native-XDP, and report if this was not possible, together with help text that display cmdline option for enabling generic-XDP/SKB-mode.

Add xdpsock option to allow XDP_PASS for AF_XDP zero-copy mode

In AF_XDP zero-copy mode, sending frame to the network stack via XDP_PASS results in an expense code path, e.g new page_alloc for copy of payload and SKB alloc. We need this test how slow this code path is.

Also consider testing XDP-level redirect out another net_device with AF_XDP-ZC enabled. (I think this will just drop the packets due to mem_type).

xdp_monitor: record and show errno

It would be a big help diagnosing XDP issues if the xdp_monitor program also reported the errno.

xdp_monitor: convert to use raw-tracepoints

The raw-tracepoints are suppose to be much faster, and XDP monitor want to have as little impact on the system as possible. Thus, convert to use raw-tracepoints.

BPF-selftests - top-level TODOs

The kernel git-tree contains a lot of selftests for BPF located in: tools/testing/selftests/bpf/.

XDP (and its performance gain) is tied closely to NIC driver code, which makes it hard to implement selftests for (including benchmark selftests). Still we should have a goal of doing functional testing of the XDP core-code components (via selftests).

Since driver veth got native-XDP support, we have an opportunity for writing selftests that cover both generic-XDP and native-XDP.

bpf-selftest: improve XDP VLAN selftests

Assignment is to improve the selftest shell-script to test both generic-XDP and native-XDP (for veth driver).

XDP add/remove VLAN headers have a selftest in tools/testing/selftests/bpf/ in files test_xdp_vlan.c and test_xdp_vlan.sh. This test was developed in conjunction with fixing a bug in generic-XDP (see kernel commit 297249569932 (“net: fix generic XDP to handle if eth header was mangled”)).

Since driver veth got native-XDP support, the selftest no-longer tests generic-XDP code path.

The ip utility (from iproute2) already support specifying, that an XDP prog must use generic XDP when loading an XDP prog (option xdpgeneric).

bpf-selftest: find XDP-selftests affected by veth native-XDP

When driver veth got native-XDP support, then the XDP-selftests that were based on veth changed from testing generic-XDP into testing native-XDP.

Assignments:

  1. Determine how many and which veth based XDP-selftests are affected
  2. Convert these selftests to test both generic-XDP and native-XDP

WAIT BTF-based metadata for XDP

Waiting for tracing people to work out the details of BTF.

WAIT XDP latency jit-vs-no jit, tuning etc

[2019-01-18 Fri 13:55] How do we ensure consistently low latency packet processing is possible with XDP?

This paper: Performance Implications of Packet Filtering with Linux eBPF conclude that turning on the jit increases the number of outliers (though not quite clear if this is actually supported by their data). This should be investigated.

Maybe write a tuning doc as well?

WAIT status as this is low priority for now.

XDP tutorial at Netdev 0x13

XDP is an increasingly popular topic and technology. XDP builds on top of eBPF.

This hands-on tutorial will provide guidance on getting started using XDP+eBPF technology with intention to let attendees for later leveraging it for your specific use-case.

More details to be posted later.

NetDev-conf presentation/Tutorial got accepted

  • State “DONE” from “TODO” [2019-01-28 Mon 13:00]
Title: “XDP hands-on tutorial” Subject: [Indico] Abstract Acceptance notification (#48)

NEXT Prepare materials for XDP tutorial

[2019-02-04 Mon 12:33] Email from Jamal Hadi Salim: XDP tutorial

NEXT NetDev-conf: add abstract under XDP-project

NEXT NetDev-conf: create overall plan for tutorial

Create github repo for NetDev-conf XDP-tutorial

We need a github repo that are easier to build than prototype-kernel github repo, as it requires the kernel source tree to build. The hands-on tutorial need to have an easier and more confined build-environment.

Related: How do we ensure participants can successfully run XDP at the event?

NEXT Needs document of best practices

E.g.:

  • Use metadata field for carrying per-packet data
  • Don’t replicate kernel state