This the top-level XDP project management file that contains tasks via
org-mode TODO
, NEXT
and DONE
. The areas directory also contains
tasks. It is recommended to use emacs when viewing and editing these
.org
files, as the github rendering view removes the TODO
and DONE
marking on the tasks.
Together with the emacs setup in ../org-setup.el, the extended
org-agenda
view can be used for project management, and the setup
automatically detect Projects and Stuck Projects. For org-agenda
view to
pickup these tasks, make sure to configure org-agenda-files
to include
this directory and areas/
directory.
The short of it is that we need consistency in the counters across NIC drivers and virtual devices. Right now stats are specific to a driver with no clear accounting for the packets and bytes handled in XDP.
Progress: @dsahern (David Ahern) have started an email thread on the subject: consistency for statistics with XDP mode
Some drivers are not updating the “ifconfig” stats counters, when in XDP mode. This makes receive or send via XDP invisible to sysadm/management tools. This for-sure is going to cause confusion.
Closer look at other drivers.
- ixgbe driver is doing the right thing.
- i40e had a bug, where RX/TX stats are swapped (fixed in commit cdec2141c24e(v4.20-rc1) (“i40e: report correct statistics when XDP is enabled”)).
- mlx5 driver is not updating the regular RX/TX counters, but A LOT of other ethtool stats counters (which are the ones I usually monitor when testing).
Need to have an upstream discussion, on what is the semantic. IHMO the regular RX/TX counters must be updated even for XDP frames.
Accounting per XDP-action is also inconsistent across drivers. Some driver account nothing, while others have elaborate counters exposed as ethtool stats.
The common pattern is that XDP programs do their own accounting of action as they see fit, and export this as BPF maps to their associated userspace application, which a very per application specific approach.
David Ahern (@dsahern) argues that sysadm’s need something more generic and consistent, as they need a way to diagnose the system, regardless of the XDP program that some developer asked to get installed. This argument that a 3rd person should be able to diagnose a running XDP program was generally accepted upstream.
There were a lot of performance concerns around introducing XDP-action counters. Generally upstream voices wanted this to be opt-in, e.g. something that the sysadm explicitly enable when needed, and not default on.
As Saeed suggested do something really simple and central like:
+++ b/include/linux/filter.h
@@ -651,7 +651,9 @@ static __always_inline u32 bpf_prog_run_xdp(const
struct bpf_prog *prog,
* already takes rcu_read_lock() when fetching the program, so
* it's not necessary here anymore.
*/
- return BPF_PROG_RUN(prog, xdp);
+ u32 ret = BPF_PROG_RUN(prog, xdp);
+ xdp->xdp_rxq_info.stats[ret]++
+ return ret;
}
WARNING: Realised above code is subject to speculative execution side-channel leaking.
And measure if the performance concerns are real or not. Ilias or Jesper can measure this on a slow ARM64 platform (espressobin), as only testing this on Intel might lead to the wrong conclusion.
Collecting XDP-actions counter is only the first step. We need to figure out the best solution for exporting this to userspace.
One option is to piggyback on ethtool stats, which the drivers already use, and it would also standardise the driver related XDP ethool stats.
Very few drivers support XDP_REDIRECT.
HW drivers: ixgbe, i40e, mlx5
SW drivers: veth, tun (tuntap), virtio_net
It would be useful with a complete list drivers that are missing some or all parts of REDIRECT support. Based on this we can make a plan for what to do about each of these.
Driver resources needed to handle a ndo_xdp_xmit() is currently tied to the driver having loaded an RX XDP program. This is strange, as allocating these Driver TX HW resources is independent.
This can quickly lead to exhausting HW resources, like IRQs lines or NIC TX HW queues, given it is assumed a TX queue is alloc/dedicated for each CPU core.
NEXT Change non-map xdp_redirect helper to use a hidden map
To be able to tie resource allocation to the interface maps (devmap
), we
first need to change the non-map redirect variant so it uses a map under the
hood. Since xdp_redirect_map() is also significantly faster than the non-map
variant, this change should be a win in itself.
Turns out the initial idea of using insertion into devmap as a trigger for resource allocation doesn’t work because of generic XDP. So we’ll need an ethtool interface; look into the existing channel configuration interface on the kernel side and figure out how to express XDP resource allocation in a good way.
Because we can’t tie resource allocation to map insertion on the kernel side, we need to solve the UI interface in userspace. So add a hook/wrapper to libbpf that will automatically allocate TX resources when inserting into a map.
The samples/bpf programs xdp_redirect + xdp_redirect_map are very user unfriendly. #1 they use raw ifindex’es as input + output. #2 the pkt/s number count RX packets, not TX’ed packets which can be dropped silently. Red Hat QA, got very confused by #2.
Simply include/sample the net_device TX stats.
The XDP sample programs unconditionally unload the current running XDP program (via -1) on exit. If users are not careful with the order in-which they start and stop XDP programs, then they get confused.
Almost done, but followup to make sure this gets merged upstream: Upstream proposal V1 (by Maciej Fijalkowski) is to check if the BPF-prog-ID numbers match, before removing the current XDP-prog.
The default behaviour when attaching an XDP program on a driver that doesn’t have native-XDP is to fallback to generic-XDP, without notifying the user of the end-state.
This behaviour is also used by xdp-samples, which unfortunately have lead end-users to falsely think a given driver supports native-XDP. (QA are using these xdp-samples and create cases due to this confusion).
Proposal is to change xdp-samples to enforce native-XDP, and report if this was not possible, together with help text that display cmdline option for enabling generic-XDP/SKB-mode.
In AF_XDP zero-copy mode, sending frame to the network stack via XDP_PASS results in an expense code path, e.g new page_alloc for copy of payload and SKB alloc. We need this test how slow this code path is.
Also consider testing XDP-level redirect out another net_device with AF_XDP-ZC enabled. (I think this will just drop the packets due to mem_type).
It would be a big help diagnosing XDP issues if the xdp_monitor program also reported the errno.
The raw-tracepoints are suppose to be much faster, and XDP monitor want to have as little impact on the system as possible. Thus, convert to use raw-tracepoints.
The kernel git-tree contains a lot of selftests for BPF located in:
tools/testing/selftests/bpf/
.
XDP (and its performance gain) is tied closely to NIC driver code, which makes it hard to implement selftests for (including benchmark selftests). Still we should have a goal of doing functional testing of the XDP core-code components (via selftests).
Since driver veth
got native-XDP support, we have an opportunity for
writing selftests that cover both generic-XDP and native-XDP.
Assignment is to improve the selftest shell-script to test both generic-XDP and native-XDP (for veth driver).
XDP add/remove VLAN headers have a selftest in tools/testing/selftests/bpf/
in files test_xdp_vlan.c
and test_xdp_vlan.sh
. This test was developed
in conjunction with fixing a bug in generic-XDP (see kernel commit
297249569932 (“net: fix generic XDP to handle if eth header was mangled”)).
Since driver veth
got native-XDP support, the selftest no-longer tests
generic-XDP code path.
The ip utility (from iproute2) already support specifying, that an XDP prog
must use generic XDP when loading an XDP prog (option xdpgeneric
).
When driver veth
got native-XDP support, then the XDP-selftests that were
based on veth
changed from testing generic-XDP into testing native-XDP.
Assignments:
- Determine how many and which veth based XDP-selftests are affected
- Convert these selftests to test both generic-XDP and native-XDP
Waiting for tracing people to work out the details of BTF.
[2019-01-18 Fri 13:55] How do we ensure consistently low latency packet processing is possible with XDP?
This paper: Performance Implications of Packet Filtering with Linux eBPF conclude that turning on the jit increases the number of outliers (though not quite clear if this is actually supported by their data). This should be investigated.
Maybe write a tuning doc as well?
WAIT status as this is low priority for now.
XDP is an increasingly popular topic and technology. XDP builds on top of eBPF.
This hands-on tutorial will provide guidance on getting started using XDP+eBPF technology with intention to let attendees for later leveraging it for your specific use-case.
More details to be posted later.
- State “DONE” from “TODO” [2019-01-28 Mon 13:00]
We need a github repo that are easier to build than prototype-kernel github repo, as it requires the kernel source tree to build. The hands-on tutorial need to have an easier and more confined build-environment.
Related: How do we ensure participants can successfully run XDP at the event?
E.g.:
- Use metadata field for carrying per-packet data
- Don’t replicate kernel state