Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable core dumps for postgres #8272

Merged
merged 1 commit into from
Jul 11, 2024
Merged

Enable core dumps for postgres #8272

merged 1 commit into from
Jul 11, 2024

Conversation

kelvich
Copy link
Contributor

@kelvich kelvich commented Jul 4, 2024

Set core rmilit to ulimited in compute_ctl, so that all child processes inherit it. We could also set rlimit in relevant startup script, but that way we would depend on external setup and might inadvertently disable it again (core dumping worked in pods, but not in VMs with inittab-based startup).

@kelvich kelvich requested review from a team as code owners July 4, 2024 20:47
Set core rmilit to ulimited in compute_ctl, so that all child processes
inherit it. We could also set rlimit in relevant startup script, but that way
we would depend on external setup and might inadvertently disable it again (core
dumping worked in pods, but not in VMs with inittab-based startup).
Copy link

github-actions bot commented Jul 4, 2024

3024 tests run: 2909 passed, 0 failed, 115 skipped (full report)


Code coverage* (full report)

  • functions: 32.6% (6933 of 21268 functions)
  • lines: 50.0% (54456 of 108885 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
a7f56ae at 2024-07-04T21:38:17.465Z :recycle:

@kelvich kelvich requested a review from ololobus July 10, 2024 09:32
@a-masterov
Copy link
Contributor

I have tried to reproduce the problem and found that the core limit inside the docker container does not depend on either the limit of the current user in the system (/etc/security/limits.conf) or the limit for the docker daemon.
The limit can be set in the docker command line:

$ ulimit -c
0
$ docker run debian bash -c "ulimit -c"
unlimited
$ docker run --ulimit core=0 debian bash -c "ulimit -c"
0

So, I tried to set the limit in the docker-compose.yml

    ulimits:
      core:
        soft: "0"
        hard: "0"

It works:

$ docker compose exec compute bash -c "ulimit -c"
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
0

Then I tried to run the images with the tag 9799742805 which were built during the workflow for this branch

$ TAG=9799742805 docker compose -f docker-compose.yml up -d
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
[+] Running 10/10
 ✔ Network docker-compose_default                   Created                                                                                                                                                   0.1s 
 ✔ Container docker-compose-minio-1                 Started                                                                                                                                                   2.1s 
 ✔ Container docker-compose-storage_broker-1        Started                                                                                                                                                   2.1s 
 ✔ Container docker-compose-minio_create_buckets-1  Started                                                                                                                                                   2.2s 
 ✔ Container docker-compose-safekeeper3-1           Started                                                                                                                                                   2.0s 
 ✔ Container docker-compose-pageserver-1            Started                                                                                                                                                   2.0s 
 ✔ Container docker-compose-safekeeper2-1           Started                                                                                                                                                   2.1s 
 ✔ Container docker-compose-safekeeper1-1           Started                                                                                                                                                   2.1s 
 ✔ Container docker-compose-compute-1               Started                                                                                                                                                   1.2s 
 ✔ Container docker-compose-compute_is_ready-1      Started 

However, there was no desired effect:

$ docker compose exec -it compute bash
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
postgres@ff1ec4bd41e1:/$ ulimit -c
0
postgres@ff1ec4bd41e1:/$ ps -ax | grep postgres: | head -1
     44 ?        Ss     0:00 postgres: checkpointer 
postgres@ff1ec4bd41e1:/$ grep core /proc/44/limits 
Max core file size        0                    0                    bytes

@kelvich
Copy link
Contributor Author

kelvich commented Jul 10, 2024

Right, hard should be non-zero. In our VMs we have 0 soft and inf as hard limit

@a-masterov
Copy link
Contributor

Now the docker-compose.yml:

    ulimits:
      core:
        soft: "0"
        hard: -1

Restarted with:

$ TAG=9799742805 docker compose -f docker-compose.yml up -d
$ docker compose exec -it compute bash
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
postgres@e97957783a5e:/$ ps -ax | grep postgres: | head -1
     44 ?        Ss     0:00 postgres: checkpointer 
postgres@e97957783a5e:/$ grep core /proc/44/limits 
Max core file size        0                    unlimited            bytes  

So now the hard limit is unlimited, but the soft limit still equals 0

@kelvich
Copy link
Contributor Author

kelvich commented Jul 10, 2024

hmhm, i assume that you are using https://github.com/neondatabase/neon/blob/1afab13ccb95ed083397c5bff1e31ae1631b1091/docker-compose/docker-compose.yml hence compute is started by that script:

/usr/local/bin/compute_ctl --pgdata /var/db/postgres/compute \

so compute image, which is COMPUTE_IMAGE=compute-node-v${PG_VERSION:-16} should be built from this branch. Is it?

Overall for this patch you can just do https://github.com/neondatabase/neon?tab=readme-ov-file#running-neon-database (but docker compose is also fine, but a bit more heavy)

@a-masterov
Copy link
Contributor

Oh, it's my fault, I forgot the --build key.
Now it works as expected

$ docker compose exec -it compute bash
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
postgres@f05af8c145cb:/$ /usr/local/bin/compute_ctl -V
2024-07-10T16:18:53.103627Z  INFO logging and tracing started
2024-07-10T16:18:53.103903Z  INFO build_tag: 9799742805
compute_ctl 0.1.0
postgres@f05af8c145cb:/$ ps -ax | grep postgres: | head -1
     51 ?        Ss     0:00 postgres: checkpointer 
postgres@f05af8c145cb:/$ grep core /proc/51/limits 
Max core file size        unlimited            unlimited            bytes 
postgres@f05af8c145cb:/$ ulimit -c
0

@kelvich
Copy link
Contributor Author

kelvich commented Jul 10, 2024

Nice, thank you! Can you approve PR then?

It will be deployed to staging automatically, and we can get backtraces of LR issue

@a-masterov a-masterov self-requested a review July 10, 2024 20:04
Copy link
Contributor

@a-masterov a-masterov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code works corrrectly

@kelvich kelvich merged commit 6bbd34a into main Jul 11, 2024
65 checks passed
@kelvich kelvich deleted the sk/core_dump branch July 11, 2024 07:20
skyzh pushed a commit that referenced this pull request Jul 15, 2024
Set core rmilit to ulimited in compute_ctl, so that all child processes
inherit it. We could also set rlimit in relevant startup script, but
that way we would depend on external setup and might inadvertently
disable it again (core dumping worked in pods, but not in VMs with
inittab-based startup).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants