Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional IPVS modules like ip_vs_lc #2409

Closed
etungsten opened this issue Sep 12, 2022 · 21 comments
Closed

Add additional IPVS modules like ip_vs_lc #2409

etungsten opened this issue Sep 12, 2022 · 21 comments
Assignees
Labels
area/core Issues core to the OS (variant independent) status/in-progress This issue is currently being worked on

Comments

@etungsten
Copy link
Contributor

etungsten commented Sep 12, 2022

Not that it's part of this thread specifically - but it would be nice to add a few more of the reecommended IPVS modules like ip_vs_sh.

Originally posted by @diranged in #1726 (comment)

@etungsten etungsten added priority/p2 area/core Issues core to the OS (variant independent) labels Sep 12, 2022
@etungsten etungsten changed the title Add additional IPVS module like ip_vs_sh Add additional IPVS modules like ip_vs_sh Sep 12, 2022
@etungsten
Copy link
Contributor Author

etungsten commented Sep 12, 2022

Hi @diranged, I just checked our 5.10 kernel config and it does seem like we have most of the IPVS modules enabled (including ip_vs_sh):

 # IPVS transport protocol load balancing support
 #
 CONFIG_IP_VS_PROTO_TCP=y
 CONFIG_IP_VS_PROTO_UDP=y
 CONFIG_IP_VS_PROTO_AH_ESP=y
 CONFIG_IP_VS_PROTO_ESP=y
 CONFIG_IP_VS_PROTO_AH=y
 CONFIG_IP_VS_PROTO_SCTP=y
 
  #
  # IPVS scheduler
  #
  CONFIG_IP_VS_RR=m
  CONFIG_IP_VS_WRR=m
  CONFIG_IP_VS_LC=m
  CONFIG_IP_VS_WLC=m
  CONFIG_IP_VS_FO=m
  CONFIG_IP_VS_OVF=m
  CONFIG_IP_VS_LBLC=m
  CONFIG_IP_VS_LBLCR=m
  CONFIG_IP_VS_DH=m
  CONFIG_IP_VS_SH=m
  CONFIG_IP_VS_MH=m
  CONFIG_IP_VS_SED=m
  CONFIG_IP_VS_NQ=m
  
  #
  # IPVS SH scheduler
  #
  CONFIG_IP_VS_SH_TAB_BITS=8
  
  #
  # IPVS MH scheduler
  #
  CONFIG_IP_VS_MH_TAB_INDEX=12
  
  #
  # IPVS application helper
  #
  CONFIG_IP_VS_FTP=m
  CONFIG_IP_VS_NFCT=y
  CONFIG_IP_VS_PE_SIP=m

Please feel free to re-open this issue if you see we're missing anything!

@diranged
Copy link

@etungsten You're right ... the one that fails for me is lc:

E0913 00:17:27.243643       1 server_others.go:107] "Can't use the IPVS proxier" err="IPVS proxier will not be used because the following required kernel modules are not loaded: [ip_vs_lc]"

@etungsten etungsten reopened this Sep 13, 2022
@etungsten etungsten changed the title Add additional IPVS modules like ip_vs_sh Add additional IPVS modules like ip_vs_lc Sep 13, 2022
@stmcginnis stmcginnis added status/needs-triage Pending triage or re-evaluation and removed priority/p1 labels Dec 1, 2022
@rubroboletus
Copy link

Main problem is, that on Bottlerocket OS only:

ip_vs_sh
ip_vs_rr
ip_vs_wrr

modules are loaded and I cannot find a way, how to load ip_vs_sed / ip_vs_lc modules during boot. Setting:

[settings.kernel.modules.ip_vs_sed]
allowed = true

does not help.

@Raboo
Copy link

Raboo commented Jan 28, 2023

I would love to see ip_vs_lc added to the image.

@bcressey
Copy link
Contributor

@foersleo what do you think?

@foersleo
Copy link
Contributor

I have had a first look at this, and from the image I can see all the ip_vs modules are in the image:

bash-5.1# pwd
/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/modules/5.15.79
bash-5.1# find . -iname ip_vs*
./kernel/net/netfilter/ipvs/ip_vs_sh.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_mh.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_rr.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_nq.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_wlc.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_ftp.ko.xz
./kernel/net/netfilter/ipvs/ip_vs.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_lblc.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_pe_sip.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_ovf.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_dh.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_fo.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_wrr.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_lblcr.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_lc.ko.xz
./kernel/net/netfilter/ipvs/ip_vs_sed.ko.xz

I also am able to load the modules manually in a dev instance:

bash-5.1# modprobe ip_vs_lc
[  923.253344] IPVS: [lc] scheduler registered.
bash-5.1# modprobe ip_vs_sed
[  927.415066] IPVS: [sed] scheduler registered.
bash-5.1# lsmod | grep ip_vs
ip_vs_sed              16384  0
ip_vs_lc               16384  0
ip_vs                 192512  4 ip_vs_lc,ip_vs_sed
nf_conntrack          176128  5 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs

So from a OS perspective they are there and healthy. Can someone provide any pointers as to how you are configuring your node so I can have a closer look at how it is failing to load the appropriate modules.

@foersleo foersleo added status/needs-info Further information is requested and removed status/needs-triage Pending triage or re-evaluation labels Feb 16, 2023
@foersleo
Copy link
Contributor

Looking further in our packages, we seem to load only the rr, wrr, and sh variants of the modules as a precursor to launching kube-proxy defined in in load-ipvs-modules.conf (here for kubernetes-1.25 variant)

The commit introducing it in kubernetes-1.22 variant has some more info on why we need to do it this way: 7919d89

tl;dr: The insmod binary used by kube-proxy is not able to load our compressed modules.

So it seems to be the fix here would be to load all available ipvs modules before starting kube-proxy, so that every user can go on and configure as needed.

@foersleo foersleo added status/in-progress This issue is currently being worked on and removed status/needs-info Further information is requested labels Feb 16, 2023
@foersleo
Copy link
Contributor

Digging a bit around in kubernetes proxier code there is some interesting bits in there:

Proxier will only ever attempt to load the following ip_vs modules: ip_vs, ip_vs_rr, ip_vs_wrr, and ip_vs_sh. These are considered "required" modules as of pkg/util/ipvs/ipvs.go, through GetModules (See code around line 678) and GetRequiredIPVSModules. All other ip_vs modules are merely checked if they are loaded, but no load will be attempted (See CanUseIPVSProxier in pkg/proxy/ipvs/proxier.go, specifically lines 745 and following).

That also explains why our workaround only covers ip_vs_rr, ip_vs_wrr, and ip_vs_sh. Those are the modules considered essential/required by kubernetes ipvs utils.

It seems that lately (past 1.26) the handling of ipvs has been changed fundamentally, but lets cross that bridge when we get there.

All of the above is mostly background and does not help to resolve the issue at hand, which is to get the users the ability to load the ipvs modules they want/need. A simple workaround would be to just load all ipvs modules similarly to the ones we already load, however, I am not convinced that is a solution that resonates well with some of the core concepts of Bottlerocket (keep the footprint small and reduce unused code). Even the current setup does load more code than needed into the kernel, when kube-proxy is used in a mode that does not rely on ipvs (i.e. iptables).

I am looking at how we might be able to load these on demand or how to make this configurable.

@arnaldo2792
Copy link
Contributor

[settings.kernel.modules.ip_vs_sed]
allowed = true

does not help.

@rubroboletus the reason why this didn't work is because the API doesn't load kernel modules, instead it prevents modules from being loaded

@foersleo FWIW, after #2772 bootstrap containers can be used to load kernel modules, like this:

chroot /.bottlerocket/rootfs modprobe vfio_pci

Or

insmod /.bottlerocket/rootfs/lib/modules/<KERNEL_VERSION>/kernel/drivers/vfio/pci/vfio-pci.ko.xz

@z0rc
Copy link
Contributor

z0rc commented Feb 23, 2023

As an end user, I would really appreciate, if IPVS support for kube-proxy would "just work" without need to set up additional configuration via bootstrap-container/userdata.

@foersleo
Copy link
Contributor

I understand that yet another configuration layer might not be the best user experience. Ideally the appropriate scheduler module would be loaded automatically, and as far as I understand that is what the ipvs handling is currently moving to in kubernetes. There has been a some efforts to restructure how ipvs is handled in the proxy code and as far as I understand the new variant would attempt to load the appropriate scheduler by going through the kernels own mechanism of adding a service with the specified scheduler (See kubernetes/kubernetes#114669 and related issues/PRs for the cleanup done in ipvs handling). (I yet, have to test if that is actually working in Bottlerocket, and it would be a while to land, given that these changes seem to be landing in kubernetes 1.27).

Which leaves us with the question of how to handle the issue for current versions of Bottlerocket. As far as I can tell there is the following options we currently have:

  1. Start building the schedulers into the kernel instead of as modules. I'd rather not do that, as one of the goals is to keep the packages small and reduce to vital code.

  2. Load all scheduler modules as we currently do with the smaller number of modules through `load-ipvs-modules.conf. As we do not differentiate the module loading on which mode kube-proxy will be used in, this loads unneeded code into the kernel for cases where we do not use ipvs at all. At least there is some of the 15 schedulers currently supported by the kernel that will not be used in all configurations.

  3. Check out the solution proposed by @arnaldo2792 (Thank you for the hint Arnaldo!) to see if that would be a stop-gap solution for the older variants. While this proposal puts the onus on the user, it overall reduces the amount of unneeded code accessible in the host OS, which aligns well with what we try to achieve with Bottlerocket. (As far as I can tell from the documentation of kube-proxy, that would currently be the Bottlerocket way of ensuring the modules are loaded as in https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy/ipvs#prerequisite).

  4. Explore other ways of configuring this.

I think all of these solutions have their pros and cons, depending from which side we are looking at it.

I will do some testing with some of these options and will have a look at if this is going to work without much headache in future kubernetes releases and see if there is a sweet spot we can get towards.

@Raboo
Copy link

Raboo commented Feb 24, 2023

So the nr 3 solution is?:

Settings:

[settings.bootstrap-containers.bootstrap]
source = "MY-CONTAINER-URI"
mode = "always"
essential = true

Where "MY-CONTAINER-URI" is a custom container image hosted publicly that looks like this?

FROM fedora
ENTRYPOINT ["chroot", "/.bottlerocket/rootfs", "modprobe", "ip_vs_lc"]

Or whichever module you want to load.

Correct?

@Raboo
Copy link

Raboo commented Feb 24, 2023

BTW, can I use a muscl image like alpine to execute the chroot command?

@foersleo
Copy link
Contributor

@Raboo Yes, that would be the way to use solution 3. However, if I see that correctly the patch linked by @arnaldo2792 has not been released yet in an official release, so the bootstrap containers with Bottlerocket up to v1.12 would not have the right capabilities to load modules. You would have to use a self-built Bottlerocket from the latest develop branch.

The choice of base image for that bootstrap container should not really matter, as long as it supports the commands you want to use (modprobe and chroot seem to be non-problematic there).

@arnaldo2792
Copy link
Contributor

@foersleo correct, the 1.13 release is where the patch will land ⛵!

@Raboo, that's right, you can use whatever image you want provided that chroot is available. modprobe is provided in the base Bottlerocket image.

@Raboo
Copy link

Raboo commented Feb 27, 2023

Ok I've prepared some bootstrap containers to load specified ip_vs modules that are not loaded by default for when v1.13 is released, check out https://github.com/Raboo/bottlerocket-modprobe.

Everything is described in the README.

@foersleo
Copy link
Contributor

I have had a look at this one again. @Raboo I have tried one of your container images in an eks cluster and it unfortunately did not work as expected:

          Welcome to Bottlerocket's admin container!
    ╱╲
   ╱┄┄╲   This container provides access to the Bottlerocket host
   │▗▖│   filesystems (see /.bottlerocket/rootfs) and contains common
  ╱│  │╲  tools for inspection and troubleshooting.  It is based on
  │╰╮╭╯│  Amazon Linux 2, and most things are in the same places you
    ╹╹    would find them on an AL2 host.

To permit more intrusive troubleshooting, including actions that mutate the
running state of the Bottlerocket host, we provide a tool called "sheltie"
(`sudo sheltie`).  When run, this tool drops you into a root shell in the
Bottlerocket host's root filesystem.
[ec2-user@admin]$ sudo sheltie
bash-5.1# systemctl status bootstrap-containers@bootstrap.service
× bootstrap-containers@bootstrap.service - bootstrap container bootstrap
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/bootstrap-containers@.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/bootstrap-containers@bootstrap.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Tue 2023-04-25 09:46:41 UTC; 28min ago
  Condition: start condition failed at Tue 2023-04-25 09:46:41 UTC; 28min ago
             └─ ConditionPathExists=!/run/bootstrap-containers/bootstrap.ran was not met
   Main PID: 1099 (code=exited, status=1/FAILURE)

Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal host-ctr[1099]: time="2023-04-25T09:46:41Z" level=info msg="pulled image successfully" img="ghcr.io/raboo/bottlerocket-modprobe:ip_vs_lc"
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal host-ctr[1099]: time="2023-04-25T09:46:41Z" level=info msg="unpacking image..." img="ghcr.io/raboo/bottlerocket-modprobe:ip_vs_lc"
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal host-ctr[1099]: time="2023-04-25T09:46:41Z" level=info msg="Container does not exist, proceeding to create it" ctr-id=boot.bootstrap
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal host-ctr[1099]: time="2023-04-25T09:46:41Z" level=info msg="container task does not exist, proceeding to create it" container-id=boot.bootstrap
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal host-ctr[1099]: time="2023-04-25T09:46:41Z" level=error msg="failed to create container task" error="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: \"chroot /.bottlerocket/rootfs modprobe ip_vs_lc\": stat chroot /.bottlerocket/rootfs modprobe ip_vs_lc: no such file or directory: unknown"
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal host-ctr[1099]: time="2023-04-25T09:46:41Z" level=fatal msg="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: \"chroot /.bottlerocket/rootfs modprobe ip_vs_lc\": stat chroot /.bottlerocket/rootfs modprobe ip_vs_lc: no such file or directory: unknown"
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal systemd[1]: bootstrap-containers@bootstrap.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal systemd[1]: bootstrap-containers@bootstrap.service: Failed with result 'exit-code'.
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal systemd[1]: Failed to start bootstrap container bootstrap.
Apr 25 09:46:41 ip-192-168-55-220.eu-central-1.compute.internal systemd[1]: bootstrap container bootstrap was skipped because of a failed condition check (ConditionPathExists=!/run/bootstrap-containers/bootstrap.ran).

I am no container expert myself, so I am not quite sure what the issue is here.

To verify that the mechanism itself works I created an image with the following Dockerfile and it worked out as expected:

FROM fedora
ENTRYPOINT ["chroot", "/.bottlerocket/rootfs", "modprobe", "ip_vs_lc"]

@Raboo
Copy link

Raboo commented Apr 25, 2023

@foersleo

stat chroot /.bottlerocket/rootfs modprobe ip_vs_lc: no such file or directory

There was a problem with my entrypoint, try now.

@foersleo
Copy link
Contributor

Thank you! I have tried with the fixed variant of containers provided by @Raboo and confirmed this works as expected. The module is loaded and the bootstrap-container is running successfully.

To configure an eks node group through eksctl with a bootstrap-container that loads the ip_vs_lc module add the following to your configuration file:

[...]
nodeGroups:
  - name: <your_nodegroup_name_here>
[...]
    bottlerocket:
      settings:
        bootstrap-containers:
          bootstrap:
            source: "ghcr.io/raboo/bottlerocket-modprobe:ip_vs_lc"
            mode: "always"

If you want to stall the booting on failing of your bootstrap container you just add essential: true to the bootstrap block (where bootstrap is the name you choose for that particular container).

With that I think we can close this issue. There is an easy way to load the necessary modules on bottlerocket.
However, I will setup a discussion to test the waters on potentially making this even easier.

@foersleo
Copy link
Contributor

Followup discussion: #3050

@z0rc
Copy link
Contributor

z0rc commented May 27, 2023

I will do some testing with some of these options and will have a look at if this is going to work without much headache in future kubernetes releases and see if there is a sweet spot we can get towards.

As EKS released 1.27, which seems to include improvements around IPVS modules detection/usage, can this be prioritized? I haven't tested it personally yet though, maybe it works fine now, without messing with modprobe in bootstrap containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent) status/in-progress This issue is currently being worked on
Projects
Development

No branches or pull requests

9 participants