Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging or crashing under Docker within LXC (Linux containers) #707

Closed
fendle opened this issue Mar 9, 2024 · 28 comments · Fixed by #801
Closed

Hanging or crashing under Docker within LXC (Linux containers) #707

fendle opened this issue Mar 9, 2024 · 28 comments · Fixed by #801
Labels
bug Something isn't working as expected
Milestone

Comments

@fendle
Copy link

fendle commented Mar 9, 2024

Hi,
my IP is not longer update in cloudflare and in my docker logs I see only this message.

cloudflareddns-cloudflare-ddns-1 | 🌟 Cloudflare DDNS (v1.11.0-0-g52d2019)
cloudflareddns-cloudflare-ddns-1 | 🥷 Dropping privileges . .

I used the standard docker compose configuration.

Thanks .

@favonia
Copy link
Owner

favonia commented Mar 9, 2024

@fendle Hi, may I have your confirmation on these?

  1. The same version (1.11.0) with the same configuration used to be working fine.
  2. However, for some reasons, it is no longer working now.
  3. Your other Docker images are working as expected.

@favonia
Copy link
Owner

favonia commented Mar 9, 2024

@fendle Your configuration might help, although the phenomenon you described does not sound like a misconfiguration of the tool. (Please hide sensitive information such as your domain names and API tokens.)

@favonia favonia added the bug Something isn't working as expected label Mar 9, 2024
@fendle
Copy link
Author

fendle commented Mar 9, 2024

Hi @favonia,
I have really the standard config, I changed only the API key and enter all my subdomains.
Is there a more detailed log which can be seen directly in the docker?

@favonia
Copy link
Owner

favonia commented Mar 9, 2024

@fendle The default configuration will already show the maximum amount of information. Could you possibly also check the things I wish to confirm in the other comment? Thank you!

@favonia
Copy link
Owner

favonia commented Mar 14, 2024

@fendle Hi, I wonder if you could help me understand your problem better?

  1. Did the same configuration work before?
  2. Are your other Docker images are working as expected?
  3. Are you able to run the Docker container alone without using docker-compose?

@fendle
Copy link
Author

fendle commented Mar 17, 2024

Hi @favonia ,
here my Feedback

  1. The configuration was working before
  2. My other docker images are working as expected
  3. I haven't tried it yet, but can do it.

It would be only helpful, why it can be executed to have a hint.#
The message "🥷 Dropping privileges . . ." is unfortauntely not really helpful :(

@favonia
Copy link
Owner

favonia commented Mar 17, 2024

@fendle Thank you. I feel I still need more information, and your testing without Docker (compose) will help a lot. I can think of the following possible causes, but all of them sound super weird to me:

  1. For whatever reasons, the tool was stuck at some system call for dropping privileges; or,
  2. For some reason, the tool was actually working as expected, but the logging was somehow stopped; or,
  3. For some reason, the Docker container was paused/stopped.

Theoretically, "dropping privileges" should not involve any system calls that could ever be blocking (even if there's an error, the error should be returned immediately), so Cause 1 should be impossible. However, other causes are equally strange as well, so I'm a bit lost. 🤔 Your experiment to run the tool without Docker compose---or even without Docker---can help check whether Causes 2 and/or 3 is the culprit.

I see that you are suggesting more detailed logging. That's actually technically difficult. The dropping involves lots of ugly low-level Linux system calls and it is annoying (maybe impossible, actually) to print out each of them. Some of them are buried inside the Go standard library to maintain consistency between threads. A more reliable way is to use tools such strace to monitor the cloudflare-ddns (run without Docker) to see which system call might be blocking the entire program. Again, none of these system calls should be blocking as far as I can see!

Another thing that could be incredibly helpful is to recall any change you might have made that caused the tool to stop working. Do you remember anything that might have affect your Docker?

@este1561997
Copy link

Hello,
same problem for me, DNSs are not updated and uptimekuma is not working too.

compose.yml:

services:
  cloudflare-ddns:
    image: favonia/cloudflare-ddns:latest
    container_name: cloudflare-ddns
    network_mode: host
    restart: always
    cap_add:
      - SETUID
      - SETGID
    cap_drop:
      - all
    read_only: true
    security_opt:
      - no-new-privileges:true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Rome
      - CF_API_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      - DOMAINS=xxxx.org,yyyy.xxxx.org
        # Your domains (separated by commas)
      - PROXIED=false
      - UPTIMEKUMA="https://zzzz.xxxx.org/api/push/LvzcnABuqW?status=up&msg=OK&ping="

logs:

🌟 Cloudflare DDNS (v1.11.0-0-g52d2019)
🥷 Dropping privileges . . .

@favonia
Copy link
Owner

favonia commented Apr 2, 2024

@este1561997 Could you try any of the following? (@fendle I apologize---I should have provided more detailed instructions for you to test things.)

  1. Directly run the DDNS updater using this in the command line:
CF_API_TOKEN=YOUR-CLOUDFLARE-API-TOKEN \
  DOMAINS=xxxx.org,yyyy.xxxx.org \
  TZ=Europe/Rome
  UPTIMEKUMA="https://zzzz.xxxx.org/api/push/LvzcnABuqW?status=up&msg=OK&ping="
  go run github.com/favonia/cloudflare-ddns/cmd/ddns@latest
  1. OR, downgrade the version to v.1.10.1 (without support of Uptime Kuma):

Change latest to 1.10.1 as follows:

services:
  cloudflare-ddns:
    image: favonia/cloudflare-ddns:1.10.1
    container_name: cloudflare-ddns
    network_mode: host
    restart: always
    cap_add:
      - SETUID
      - SETGID
    cap_drop:
      - all
    read_only: true
    security_opt:
      - no-new-privileges:true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Rome
      - CF_API_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      - DOMAINS=xxxx.org,yyyy.xxxx.org
        # Your domains (separated by commas)
      - PROXIED=false
      - UPTIMEKUMA="https://zzzz.xxxx.org/api/push/LvzcnABuqW?status=up&msg=OK&ping="
  1. OR, downgrade and run it without Docker:
CF_API_TOKEN=YOUR-CLOUDFLARE-API-TOKEN \
  DOMAINS=xxxx.org,yyyy.xxxx.org \
  TZ=Europe/Rome
  go run github.com/favonia/cloudflare-ddns/cmd/ddns@v1.10.1

Testing any of these will help. Thank you!

@massijay
Copy link

massijay commented Apr 2, 2024

Hi @favonia, I tried your suggestions and downgrading didn't fix the problem, however if I try to run directly without docker I get this error message

😦 Failed to reset group IDs to only 1000; current ones: 1000, 27, 100, 995
SIGSYS: bad system call
PC=0x46b8d7 m=2 sigcode=0

And after this, all the stack trace

@favonia
Copy link
Owner

favonia commented Apr 2, 2024

@massijay Thank you for your testing!!! Now I have much better ideas about what might be going on. May I ask if the edge version fixes it? I couldn't devote enough time before the semester ends to implement any feature, but if the new Go runtime (1.22) or the new libcap fixes the issue, a new release seems to be worth it. It could also be that Docker is the culprit... 🤔

Anyway, please try the latest development version, using "edge" instead of "latest":

services:
  cloudflare-ddns:
    image: favonia/cloudflare-ddns:edge
    container_name: cloudflare-ddns
    network_mode: host
    restart: always
    cap_add:
      - SETUID
      - SETGID
    cap_drop:
      - all
    read_only: true
    security_opt:
      - no-new-privileges:true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Rome
      - CF_API_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      - DOMAINS=xxxx.org,yyyy.xxxx.org
        # Your domains (separated by commas)
      - PROXIED=false
      - UPTIMEKUMA="https://zzzz.xxxx.org/api/push/LvzcnABuqW?status=up&msg=OK&ping="

I plan release a version that removes all privilege dropping so that at least it would work (this is tracked by #728).

@massijay
Copy link

massijay commented Apr 3, 2024

@favonia Unfortunately it's still not working with edge version too
I'm trying to this on docker on a freshly configured proxmox lxc container actually, a not so simple setup

@massijay
Copy link

massijay commented Apr 3, 2024

I tried running without docker the fork without priviledge dropping and it works

@favonia
Copy link
Owner

favonia commented Apr 3, 2024

@massijay Thinking about it, I feel Docker has no roles. It appears that some Linux system call that was valid became invalid. (And to be clear, none of these system calls are directly initiated by the DDNS updater.)

Do you still have the backtrace? It will be helpful for me to see which call exactly is causing trouble. I would like to check whether there's any recent change or bugfix (in Linux kernel, Go runtime, the cap library, etc) related to it. What could also be helpful is your Linux distro and the exact Linux kernel version, but I understand those could be sensitive. :-)

@massijay
Copy link

massijay commented Apr 3, 2024

I am running Ubuntu 23.10 (GNU/Linux 6.5.13-3-pve x86_64)
I attach the full stack trace
stacktrace.txt

@favonia
Copy link
Owner

favonia commented Apr 3, 2024

@fendle @massijay Could you confirm that the following 3-line program will crash (!) the Go runtime? If so, maybe the bug should be reported to either the Go team or the Linux team. (I'm not sure who's at fault yet.) Unfortunately, I could not reproduce the crash on my machines, so I might not be the best person to report it.

package main

import "kernel.org/pub/linux/libs/security/libcap/cap"

func main() { cap.NewSet().SetProc() }

To use go run, you might need the following go.mod and go.sum (which can also be created automatically using go mod init foo and go mod tidy): go.mod as

module minitest

go 1.22.1

require kernel.org/pub/linux/libs/security/libcap/cap v1.2.69

require kernel.org/pub/linux/libs/security/libcap/psx v1.2.69 // indirect

and go.sum as

kernel.org/pub/linux/libs/security/libcap/cap v1.2.69 h1:N0m3tKYbkRMmDobh/47ngz+AWeV7PcfXMDi8xu3Vrag=
kernel.org/pub/linux/libs/security/libcap/cap v1.2.69/go.mod h1:Tk5Ip2TuxaWGpccL7//rAsLRH6RQ/jfqTGxuN/+i/FQ=
kernel.org/pub/linux/libs/security/libcap/psx v1.2.69 h1:IdrOs1ZgwGw5CI+BH6GgVVlOt+LAXoPyh7enr8lfaXs=
kernel.org/pub/linux/libs/security/libcap/psx v1.2.69/go.mod h1:+l6Ee2F59XiJ2I6WR5ObpC1utCQJZ/VLsEbQCD8RG24=

@massijay
Copy link

massijay commented Apr 3, 2024

I can confirm it crashes in a unprivileged lxc container but it freezes (and keeps using the 100% of cpu) in a privileged container, which it seems to be the same also for the docker container.
Unfortunately I've never used Go (until now) and I don't really know what to report to the Go or Linux team :(

@favonia
Copy link
Owner

favonia commented Apr 3, 2024

@massijay Thank you. One more question: are you running Docker in an LXC when you said "Docker container"? To be honest, I found neither looping nor completely crashing acceptable for such a simple program no matter what the setup is, but such information might help other people if you (or I) report the bug. (And, I assume you are not reporting the bug.)

@favonia
Copy link
Owner

favonia commented Apr 4, 2024

@massijay It seems setting up LXC would take some efforts... does the 3-line program still crash even without LXC?

@fendle Are you also using LXC?

@massijay
Copy link

massijay commented Apr 8, 2024

are you running Docker in an LXC when you said "Docker container"?

Yes

does the 3-line program still crash even without LXC?

Unfortunately I cannot try outside a LXC container on the host, however I think that the problem is occurring due to the LXC.
In my other machine it works fine without any issues

@favonia
Copy link
Owner

favonia commented Apr 8, 2024

@massijay Thank you for the data point.

I don't think it's possible to gracefully handle this kind of errors inside the DDNS updater. It's a system-level bug that needs to be fixed in the combination (LXC + Docker + libcap + Go).

@massijay
Copy link

Thank you anyway for the timely support! I'm looking forward to try the app without cap drop on Docker when it's available :)

@favonia favonia changed the title My IP is not updated Broken under LXC + Docker Apr 11, 2024
@favonia
Copy link
Owner

favonia commented Apr 11, 2024

@fendle @este1561997 I am changing the issue title, assuming that Linux containers are part of the combination to trigger the bug. If that's the cause, for the time being, please use this fork maintained by @suraw00t before I deliver a more permanent solution. Sorry about the trouble!

@favonia favonia changed the title Broken under LXC + Docker Broken under LXC (Linux containers) Apr 11, 2024
@este1561997
Copy link

Hi @favonia , @massijay was doing all the tests for me, so I'll wait for the new version.
Thanks for your time. :)

@favonia favonia changed the title Broken under LXC (Linux containers) Hanging or crashing the runtime under Docker within LXC (Linux containers) Jun 21, 2024
@favonia favonia changed the title Hanging or crashing the runtime under Docker within LXC (Linux containers) Hanging or crashing under Docker within LXC (Linux containers) Jun 21, 2024
@favonia favonia pinned this issue Jun 21, 2024
@favonia
Copy link
Owner

favonia commented Jun 21, 2024

@massijay @este1561997 I could not trigger the bug with only LXC. It seems I can do capabilities just fine with only LXC. Can you confirm that you are using Docker inside LXC? In any case, could you possibly confirm that the experimental Docker tag edge-nocapdrop is working for you?

@favonia
Copy link
Owner

favonia commented Jul 17, 2024

Solution after 1.13.0: the updater no longer uses the cap library. Use Docker's built-in mechanism to limit superuser privileges.

@massijay
Copy link

Hi, sorry for the late response, we just tried the new version and it's now working flawlessy, thank you very much for your support!

@favonia
Copy link
Owner

favonia commented Jul 25, 2024

@massijay @este1561997 To clarify, the solution (removing cap completely) to this issue can benefit from an update to your configuration, and I am on the mission to eliminate the old template from this universe. In case you have not updated your configuration for the new design, here's the tl;dr:

  1. Remove cap_add: ... completely.
  2. Add user: "1000:1000" (or other IDs you want to use).
  3. Remove PUID=... and PGID=... (optional, but why not).

PS: the new template works for older versions of the updater as well!

PPS: @fendle I am not sure if you are still using this updater, but if you do, the new version should work out of the box, and it can benefit from a configuration update as described here. You were using the old template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants