Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

ignite-spawn container has multiple interfaces with same MAC, networking (randomly) broken #633

Closed
twelho opened this issue Jul 3, 2020 · 2 comments · Fixed by #638
Closed
Assignees
Labels
area/networking Issues related to networking kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@twelho
Copy link
Contributor

twelho commented Jul 3, 2020

We've now identified the issue, the ignite-spawn container has multiple interfaces used for bridging sharing the same MAC address. The cause of this is still unknown, a quick fix is to (unsuccessfully) ping the previous IP address given a VM by CNI and after that ping the current IP address. This will open communications both ways (now you can ping the VM and the VM can access the outside world).

This has also been seen rarely with the docker-bridge backend where the VMs have no access to the outside world.

cc @stealthybox, @chanwit

Originally posted by @twelho in #616 (comment)

@twelho twelho added area/networking Issues related to networking kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 3, 2020
@twelho twelho self-assigned this Jul 3, 2020
@twelho
Copy link
Contributor Author

twelho commented Jul 7, 2020

Duplicate MAC address issue figured out (does not fix networking on its own). We're calling netlink.LinkSetMaster here (and again some lines below that):

if err = handle.LinkSetMaster(tuntap, bridge); err != nil {

This is to bind the container-given network interface as well as the virtual VM interface to the bridge. If netlink.LinkSetMaster is called on a "dangling" bridge (nothing attached to it) and a virtual (TUNTAP) device, both devices will change their MAC addresses to something arbitrary, most often both get the same MAC. This does not happen if a "real" adapter device (such as the ethernet card on my PC) is attached to the bridge first using that same function call. If both devices are virtual (like in Ignite's case), this might also only affect the virtual first device attached to the bridge. Anyways this is very likely a bug in netlink, though I'll investigate a little further before opening an issue since even when manually persisting the MAC addresses, networking still doesn't work out of the box with Flannel and dmesg shows the following messages:

br_eth0: received packet on eth0 with own address as source address (addr:a6:e4:3d:ce:c4:34, vlan:0)

@twelho
Copy link
Contributor Author

twelho commented Jul 9, 2020

own address as source address issue figured out: Linux bridge devices have an internal MAC address lookup table to optimize routes, but the bridge is not smart enough to realize it is inside a container which seems to prevent it from looking up MACs from the outside world (or the VM for that matter). This causes the lookups to relay back for both the VM TAP device and the container's eth0 device, resulting in ethernet frames being routed to nowhere, and both interfaces to be added to the lookup table twice. That then causes the received packet with own address as source address errors in dmesg. As soon as I set the ageing timer for that bridge to 0 (so don't "cheat" with the lookup table, just act like a regular relaying bridge), everything starts working.

The lookup table is designed to keep track of which interface to route packets to instead of copying and relaying to all interfaces attached to the bridge (so it's essentially an optimization), but since we don't get any benefit from that behavior with only two devices attached to each bridge (only one way for the traffic to flow), it's safe to disable without a performance hit.

PR for ageing timer support in netlink: vishvananda/netlink#554

Thanks to the following helpful resources:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/networking Issues related to networking kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
1 participant