Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: reap exited child processes #1384

Merged
merged 1 commit into from
Mar 12, 2021

Conversation

tjkirch
Copy link
Contributor

@tjkirch tjkirch commented Mar 11, 2021

Issue number:

Fixes #1380

Description of changes:

We spawn some background processes, like thar-be-settings, when they might take
some time and we don't want to hold up API clients.  Unless we `wait` for those
children when they exit, their process ID lives on as a zombie.  This adds a
SIGCHLD handler to apiserver to call wait when any child exits.

Testing done:

Before, each apiclient set would leave a zombie. After, no zombies!

I ran them in a tight loop with i=0; while :; do apiclient set settings.motd=hi$i; let i++; done which accomplished about 120 changes per second. I separately watched the motd file and saw that the operations were successful, and saw that no zombies were stacking up. You'd see 2-3 (live) thar-be-settings processes, presumably because they're waiting on the coarse API write lock. I did once catch a [thar-be-settings] zombie before the signal/loop/wait caught it, but it was immediately cleaned up, so we know things are working.

I also tested:

  • manual PATCH, /tx/commit, /tx/apply, because that follows a slightly different code path to call thar-be-settings, and it was fine.
  • apiclient update still updated a host successfully.
  • repeated apiclient check doesn't leave thar-be-updates zombies, either.
  • systemctl status running, pod ran OK.
  • 0% CPU usage from apiserver (after some sets) in its normal idle state, so no haywire loop.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@tjkirch tjkirch requested review from bcressey and webern March 11, 2021 19:21
Copy link
Contributor

@srgothi92 srgothi92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting !

We spawn some background processes, like thar-be-settings, when they might take
some time and we don't want to hold up API clients.  Unless we `wait` for those
children when they exit, their process ID lives on as a zombie.  This adds a
SIGCHLD handler to apiserver to call wait when any child exits.
@tjkirch
Copy link
Contributor Author

tjkirch commented Mar 12, 2021

^ This push improves the comments based on @bcressey helping me understand the one-to-many nature of CHLD signals to exited children.

@tjkirch tjkirch merged commit bf2a78a into bottlerocket-os:develop Mar 12, 2021
@tjkirch tjkirch deleted the apiserver-wait-child branch March 12, 2021 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zombie child process started from apiserver
4 participants