Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout: subprocess is never terminated #4176

Closed
miles170 opened this issue Nov 22, 2022 · 7 comments · Fixed by #4315
Closed

timeout: subprocess is never terminated #4176

miles170 opened this issue Nov 22, 2022 · 7 comments · Fixed by #4315

Comments

@miles170
Copy link
Contributor

When I was working on #4073, I discovered that timeout never terminates the subprocess under the sh shell.
Is this expected behavior?

$ timeout .5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+ 55384  0.0  0.0   3244   712 pts/0    S+   05:05   0:00 grep split
$ /coreutils/target/release/timeout .5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+ 55388  6.0  0.0   2680  1672 pts/0    S    05:05   0:00 split --filter=head -c1 >/dev/null -b 1000
coreuti+ 61201  0.0  0.0   3244   644 pts/0    S+   05:05   0:00 grep split
@tertsdiepraam
Copy link
Member

Nope, looks like a bug. Does it work with other times? Like, 0.5, 1 and 1.5?

@miles170
Copy link
Contributor Author

miles170 commented Nov 22, 2022

Nope, looks like a bug. Does it work with other times? Like, 0.5, 1 and 1.5?

$ timeout 1.5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+  1916  0.0  0.0   3244   648 pts/0    S+   07:38   0:00 grep split
$ /coreutils/target/release/timeout 1.5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+  1920  8.0  0.0   2944  1880 pts/0    S    07:38   0:00 split --filter=head -c1 >/dev/null -b 1000
coreuti+ 14847  0.0  0.0   3244   648 pts/0    S+   07:38   0:00 grep split

I don't think so. GNU timeout sends a signal to the process group if it is in the background.
See https://github.com/coreutils/coreutils/blob/master/src/timeout.c#L236-L246

If I use libc::kill(0, signal) to send a signal to the process group, the timeout will receive it.
Cannot find a way to send a signal to all children without breaking the current code.

@tertsdiepraam
Copy link
Member

Oh I see, I misinterpreted the explanation. Thanks!

@miles170
Copy link
Contributor Author

Would you mind giving me some hints so I can solve this problem?

@tertsdiepraam
Copy link
Member

I was just looking through the code, but I'm not sure what's going wrong, so I don't have any hints, sorry.

@castilma
Copy link

castilma commented Jan 31, 2023

@miles170: Before GNU timeout kills the process group it blocks the signal for itself first here.

It should be enough to just copy that behaviour, i.e. block the signal, before sending it to the process group, and also send SIGCONT to the child and the whole process group after that.


Note, that GNU timeout doesn't check for errors of signal. We shouldn't, either. For example: The subprocess/shell might have exited just after the timeout. Sending a signal now would return ESRCH, but we should still try to kill the children.

On linux, man 2 kill documents only 3 possible errors of kill(). IMHO all these errors can be ignored, and timeout() should continue trying to kill the process group.

       EINVAL An invalid signal was specified.

       EPERM  The  calling process does not have permission to send the signal
              to any of the target processes.

       ESRCH  The target process or process group does not exist.   Note  that
              an existing process might be a zombie, a process that has termi‐
              nated execution, but has not yet been wait(2)ed for.

@castilma
Copy link

If the process exits by itself right after the timeout, GNU timeout shows a different behaviour: After sending the signals, it calls waitpid() again, which tells, why the process exited.

Our timeout just assumes the process exited because of the timeout. We should call wait() after sending the signals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants