-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building/deploying pods on 4.8.0-0.okd-2021-10-24-061736 via git fails with memory cgroup limits error #998
Comments
cgroupsv2 has been enabled, which is supported only from OKD 4.9 Please attach (or upload to the public file sharing service) must-gather archive |
Thank you for the quick response. We had snapshots of the virtual machines before upgrading to OKD 4.8, so we reverted the nodes back to 4.7 and checked to see if the issue was still present. Even under 4.7 cgroup's v2 was active. We upgraded from 4.7 to 4.8, instead of installing 4.8 directly, because we currently have two OpenShift 4.7 clusters we are planning upgrades for. We wanted to test out the process and explore what is new in 4.8. My colleague tracked down 4.8.0-0.okd-2021-10-24-061736 and 4.7.0-0.okd-2021-09-19-013247 as the underlying images. Is there a way to provide you the must-gather in private. Since we are using this in the enterprise space I want to make sure nothing gets leaked. Do you have any idea how cgroup's v2 may have been enabled for the OKD 4.7/4.8 images? Or if cgroup's v2 being enabled was intentional, why it is not working properly? Just as a note, the reason we are using OKD instead of OpenShift for testing is because RedHat said they were not willing to provide free to use licenses for a test cluster. In our case the test cluster is for administrators, and not for our developers to test applications. With most of our on premises products we stand up at least three instances, test, development, and production. We then use the test environment to validate there are no issues with upgrades or major changes before taking the same steps under development and production. It was suggested that we use cloud resources to spin up a test cluster on demand, but it wouldn't align with the our on premise installation and configuration, so it would not make for a relevant test environment. Do you know if it is possible to run OpenShift unlicensed? My impression is it still works fine, but we would not get support for the unlicensed cluster, which is not required for our use case. I am asking because we have found OKD to be less stable then OpenShift. |
Is this reproducible in OKD 4.9? Regarding OCP: Please take a look at https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it |
Hello @LorbusChris, I am not sure if this is reproducible in 4.9. We could possibly upgrade OKD to 4.9 as a test, but we really want to keep OKD in lockstep with the version our other OpenShift clusters are on. As I mentioned the intention of the OKD cluster is to try and catch any potential breaking changes with upgrades, or to test high risk cluster configuration changes, in an environment where we don't have to worry about impacting users or developers. The other OpenShift clusters are actually on 4.7 at the moment, be will be bumping them up to 4.8 after the holidays. I am guessing that cgroups v2 is better supported in 4.9 and would work properly. What I don't understand is how we ended up with this problem under both 4.7 and 4.8 of OKD. I understand that k8s and OpenShift are transitioning to the use of cgroups v2, but this is not an issue we have had while using the OpenShift images. We also had to struggle to get the OKD cluster up and running (#897). Thank you for the the information about licensing, it's a bit unfortunate though, most of the products we use give us a couple extra licenses to run a test/backup instance. |
Have you tried to manually set cgroupsv1 by adding a MachineConfig that sets the respective karg? |
We haven't tried this yet, but we can give it a go. We haven't had to do this for any other clusters so far though, so I am wondering why it is necessary this time. |
Looks similar to #710 |
Yes I think that's the same issue. Adding kargs via the Users who want to go back to cgroupsv1 can add a MachineConfig object like this to do that:
|
Thank you both for the follow up information. I will try creating the MachineConfig object you have suggested and let you know the result. I will also check what the kernel argument was set to before making changes. |
@LorbusChris we applied the MachineConfig object that you provided and it resolved our issue. I checked for the kernel parameter before applying the change, but I checked using sysctl -a, I didn't notice that kernel parameter was being set via a grub command line argument (/proc/cmdline) until after applying the object. I am not sure if this was caused by the images we used, or all the 4.7/4.8 images are like this. Fixing the images would be ideal if it is relevant to all of them, but if it is not possible to fix the images I think this should be added to the documentation. Thanks for your help. |
Perhaps you started with FCOS which defaults to cgroupsv2, while OKD 4.7 was tested with cgroupsv1-by-default FCOS, so implicit setting was required |
I am thinking that the FCOS image we used, even though it was an older image, defaulted to cgroups v2 as well. I know it is becoming less common to install 4.7 or 4.8, but I am just thinking maybe this MachineConfig object should be created by default for older versions of OKD, or there should be some documentation about how to resolve the issue under older versions of OKD if it occurs. It sounds like there was a fix applied for this in the issue you referenced, but for specific images. I am not really sure what is better. From an upgrade perspective it is probably better to be setting the kernel parameter in the image, and then providing some reference documentation in case someone like myself still encounters the issue. I am happy this was relatively easy to identify and fix. We can probably mark this issue as closed, but I wanted to follow up with some feedback before doing so. |
OKD releases on a rolling basis, so there won't usually be any more releases for 4.7 or 4.8 now. Feel free to open a docs PR with clarification against https://github.com/openshift/openshift-docs. |
Hello,
We recently provisioned a OKD 4.7 cluster using the bare metal UPI method, and then upgraded the cluster to OKD 4.8. Currently when trying to build a pod using Git as the source via the catalog, or a custom git repository, we receive the following error.
The nodes are virtual machines on top of RHV. Each of the worker nodes is provisioned with 12G of memory.
I notice that /sys/fs/cgroup looks considerably different between our existing OpenShift 4.7 clusters and the new OKD 4.8 cluster. This may be a difference between Fedora CoreOS and RedHat CoreOS.
Fedora CoreOS version:
OKD 4.8 /sys/fs/cgroup:
OpenShift 4.7 /sys/fs/cgroup:
I am not sure if this is a bug, or if there is day two configuration we have missed. We have provisioned several instances of OpenShift before provisioning this OKD cluster though and this is the first time encountering the issue. If you would like the output of systemd-cgls, or any other information, please let me know.
The text was updated successfully, but these errors were encountered: