Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot cleanly kill a subprocess using high-level asyncio APIs #88050

Closed
rabraham mannequin opened this issue Apr 19, 2021 · 17 comments
Closed

Cannot cleanly kill a subprocess using high-level asyncio APIs #88050

rabraham mannequin opened this issue Apr 19, 2021 · 17 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@rabraham
Copy link
Mannequin

rabraham mannequin commented Apr 19, 2021

BPO 43884
Nosy @asvetlov, @1st1, @sweeneyde, @kumaraditya303, @skeggse
PRs
  • bpo-43884: Fix asyncio subprocess kill process cleanly when process is blocked #31611
  • gh-88050: Fix asyncio subprocess kill process cleanly when process is blocked #32073
  • Files
  • kill_subprocess.py: Kill subprocess using the high-level asyncio subprocess APIs
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-04-19.02:47:38.245>
    labels = ['3.11', 'type-bug', 'expert-asyncio']
    title = 'Cannot cleanly kill a subprocess using high-level asyncio APIs'
    updated_at = <Date 2022-03-23.09:08:45.631>
    user = 'https://bugs.python.org/rabraham'

    bugs.python.org fields:

    activity = <Date 2022-03-23.09:08:45.631>
    actor = 'kumaraditya'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2021-04-19.02:47:38.245>
    creator = 'rabraham'
    dependencies = []
    files = ['49965']
    hgrepos = []
    issue_num = 43884
    keywords = ['patch']
    message_count = 6.0
    messages = ['391349', '391350', '391369', '391379', '399937', '415859']
    nosy_count = 6.0
    nosy_names = ['asvetlov', 'yselivanov', 'rabraham', 'Dennis Sweeney', 'kumaraditya', 'skeggse']
    pr_nums = ['31611', '32073']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue43884'
    versions = ['Python 3.11']

    @rabraham
    Copy link
    Mannequin Author

    rabraham mannequin commented Apr 19, 2021

    There doesn't appear to be a way to prematurely kill a subprocess using the high-level asyncio subprocess APIs (https://docs.python.org/3.9/library/asyncio-subprocess.html) without getting a traceback on exit.

    On exit, the attached program writes the following to stderr:

    $ python3.9 kill_subprocess.py
    Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x1065f0dc0>
    Traceback (most recent call last):
        ...
        raise RuntimeError('Event loop is closed')
    RuntimeError: Event loop is closed

    If I uncomment # process._transport.close() or comment asyncio.sleep(1), the walkback disappears. (I get the same behavior in python 3.8. I haven't tried other python versions.)

    @rabraham rabraham mannequin added 3.8 (EOL) end of life 3.9 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error labels Apr 19, 2021
    @rabraham
    Copy link
    Mannequin Author

    rabraham mannequin commented Apr 19, 2021

    Reproducing the program here:

        import asyncio
        
        async def test():
            process = await asyncio.create_subprocess_shell(
                "sleep 2 && echo done",
                stdout=asyncio.subprocess.PIPE,
            )
            await asyncio.sleep(1)
            process.kill()
            await process.wait()
            # process._transport.close()
        
        asyncio.run(test())

    Can I use the high-level API to kill a subprocess cleanly without having to access the protected member process._transport? Seems like an oversight perhaps?

    @sweeneyde
    Copy link
    Member

    Running kill_subprocess.py on Windows 10, I get these results:

    Python 3.7.2 (tags/v3.7.2:9a3ffc0492)
    - raises NotImplementedError in base_events.py, _make_subprocess_transport
    Python 3.8.2 (tags/v3.8.2:7b3ab59)
    - Success
    Python 3.9.0 (tags/v3.9.0:9cf6752)
    - Success
    Python 3.10.0a6 (tags/v3.10.0a6:cc12888)
    - Success

    What is your OS?

    @rabraham
    Copy link
    Mannequin Author

    rabraham mannequin commented Apr 19, 2021

    I see this on MacOS and Linux, but I suspect any Unix-like system would have the same behavior.

    @skeggse
    Copy link
    Mannequin

    skeggse mannequin commented Aug 19, 2021

    I'm also experiencing this, with virtually identical code, on macOS 10.15.7. The given fix (process._transport.close()) also works for me, so I'm just using that workaround for the time being.

    @kumaraditya303 kumaraditya303 added 3.11 only security fixes and removed 3.8 (EOL) end of life 3.9 only security fixes labels Feb 28, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kumaraditya303 kumaraditya303 added 3.10 only security fixes 3.12 bugs and security fixes labels Jul 9, 2022
    @kumaraditya303 kumaraditya303 self-assigned this Jul 9, 2022
    @gvanrossum
    Copy link
    Member

    I'm still trying to get my head around the issue here.

    If I replace the shell command in the repro script with "timeout 2\necho done" I get a failure on Windows too (first RuntimeError: Event loop is closed from __del__; then a unique one, ValueError: I/O operation on closed pipe). The workaround/fix (process._transport.close()) works there too.

    @kumaraditya303
    Copy link
    Contributor

    Here is my investigation which led to the fix as in PR #32073.

    Without fix:
    When a process exits when it is killed with e.g. proc.kill, the following happens:

    • _process_exited of transport is called by the event loop by the child watcher.
    • Schedules _protocol.process_exited to be called and then it checks if all the pipes are closed in _try_finish if so call _call_connection_lost but since the process was closed forcefully by killing it, the pipes are still not closed so it does not calls the _call_connection_lost.
    • It wakes up the proc.wait() waiters that the process has exited but here is the problem that it wakes up the waiter early and _call_connection_lost is not called yet. At this point the pipes are in broken state that process has exited but they are not closed yet.
    • Child watcher schedules pipe connections to be closed but by this time the loop is closed as proc.wait() did not wait for it. This leads to the RuntimeError reported in the issue.

    With my fix things are consistent and there is no race with calling callbacks:

    • _process_exited of transport is called by the event loop by the child watcher.
    • Schedules _protocol.process_exited to be called on next iteration of loop.
    • Close all the pipes as the process is already exited and no broken pipes issue.
    • Call _try_finish and since the pipes are closed it succeeds and schedules _call_connection_lost to be called on next iteration. NOTE: At this point proc.wait() waiters are not woken up yet.
    • Wake up waiters in finally block of _call_connection_lost which is the last callback in the lifetime of a subprocess. At this point all the callbacks are called and everything is consistent as it is the last callback.

    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 5, 2022
    …rocess is blocked (pythonGH-32073)
    
    (cherry picked from commit 7015e13)
    
    Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
    miss-islington added a commit that referenced this issue Oct 5, 2022
    … is blocked (GH-32073)
    
    (cherry picked from commit 7015e13)
    
    Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
    @gvanrossum
    Copy link
    Member

    Looks like the 3.10 backport has broken tests. @kumaraditya303 can you look into this? If it's too much effort we can just not backport.

    @kumaraditya303
    Copy link
    Contributor

    Skipping 3.10 has it has diverged too much from main and not worth investing more time into this, 3.11 and main is fixed. Thanks @gvanrossum for reviews!

    @kumaraditya303 kumaraditya303 removed the 3.10 only security fixes label Oct 5, 2022
    gvanrossum pushed a commit that referenced this issue Oct 6, 2022
    Check for None when iterating over `self._pipes.values()`.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 6, 2022
    …nGH-97951)
    
    Check for None when iterating over `self._pipes.values()`.
    (cherry picked from commit e2e6b95)
    
    Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
    carljm added a commit to carljm/cpython that referenced this issue Oct 6, 2022
    * main: (66 commits)
      pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
      docs(typing): add "see PEP 675" to LiteralString (python#97926)
      pythongh-97850: Remove all known instances of module_repr() (python#97876)
      I changed my surname early this year (python#96671)
      pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
      pythongh-91539: improve performance of get_proxies_environment  (python#91566)
      build(deps): bump actions/stale from 5 to 6 (python#97701)
      pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
      pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
      pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
      pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896)
      pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383)
      pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759)
      pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251)
      pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662)
      pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644)
      pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756)
      pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778)
      pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826)
      Add re.VERBOSE flag documentation example (python#97678)
      ...
    carljm added a commit to carljm/cpython that referenced this issue Oct 6, 2022
    * main:
      pythonGH-88050: fix race in closing subprocess pipe in asyncio  (python#97951)
      pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962)
      pythongh-95986: Fix the example using match keyword (python#95989)
      pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944)
      pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929)
      pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610)
      pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703)
      pythonGH-88968: Add notes about socket ownership transfers (python#97936)
      pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528)
    lysnikolaou pushed a commit to lysnikolaou/cpython that referenced this issue Oct 6, 2022
    …n#97951)
    
    Check for None when iterating over `self._pipes.values()`.
    ambv pushed a commit that referenced this issue Oct 6, 2022
    …7951) (#97978)
    
    Check for None when iterating over `self._pipes.values()`.
    (cherry picked from commit e2e6b95)
    
    Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
    carljm added a commit to carljm/cpython that referenced this issue Oct 8, 2022
    * main: (53 commits)
      pythongh-94808: Coverage: Test that maximum indentation level is handled (python#95926)
      pythonGH-88050: fix race in closing subprocess pipe in asyncio  (python#97951)
      pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962)
      pythongh-95986: Fix the example using match keyword (python#95989)
      pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944)
      pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929)
      pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610)
      pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703)
      pythonGH-88968: Add notes about socket ownership transfers (python#97936)
      pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528)
      pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
      docs(typing): add "see PEP 675" to LiteralString (python#97926)
      pythongh-97850: Remove all known instances of module_repr() (python#97876)
      I changed my surname early this year (python#96671)
      pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
      pythongh-91539: improve performance of get_proxies_environment  (python#91566)
      build(deps): bump actions/stale from 5 to 6 (python#97701)
      pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
      pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
      pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
      ...
    mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
    mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
    …n#97951)
    
    Check for None when iterating over `self._pipes.values()`.
    @theo-brown
    Copy link

    Is this meant to be fixed in 3.11+?

    @gvanrossum
    Copy link
    Member

    Yes

    @theo-brown
    Copy link

    Which patch? Not in 3.11.0 as far as I can tell.

    @gvanrossum
    Copy link
    Member

    #32073 and a followup are present in 3.11.1. They may not be present in 3.11.0.

    @gertvdijk
    Copy link

    gertvdijk commented Apr 2, 2023

    Hmm. I came here using Google while I'm on 3.11.2 already. This happens sometimes for me on a subprocess wrapped inside a task and stdin/stderr connected. It also only triggers when I am terminating a process myself, instead killing it or letting the loop killing it.

    Found this when hitting random test failures on 3.11.2 and created the minimal reproducer below.

    import asyncio
    import logging
    
    
    async def myfunc_inner() -> None:
        subprocess_task = asyncio.create_task(
            asyncio.create_subprocess_exec(
                "sleep",
                "10",
                # seems to be relevant; unable to trigger without stdout/stderr connected
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )
        )
        proc = await subprocess_task
    
        try:
            await asyncio.wait_for(proc.communicate(), timeout=0.02)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            logging.info("Timeout on subprocess command, terminating.")
            # unable to trigger with proc.kill() or without terminating it.
            proc.terminate()
            return
        else:
            logging.info(f"process exited with {proc.returncode}]")
    
    
    # unable to trigger without nested task.
    async def myfunc_outer() -> None:
        await asyncio.wait_for(myfunc_inner(), timeout=1.0)
    
    
    if __name__ == "__main__":
        logging.basicConfig(level=logging.DEBUG)
        loop = asyncio.get_event_loop_policy().new_event_loop()
        loop.run_until_complete(myfunc_outer())
        # This fixes it, but hides the underlying problem I think.
        # loop.run_until_complete(asyncio.sleep(0.01))
        loop.close()

    Running this in a shell loop 200 times, this gives ~ 1-10 failures per loop, looks random.

    $ for i in `seq 1 200`; do python proctimeout.py; done
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7f5bff42a2a0>
    Traceback (most recent call last):
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_subprocess.py", line 126, in __del__
        self.close()
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_subprocess.py", line 104, in close
        proto.pipe.close()
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/unix_events.py", line 558, in close
        self._close(None)
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/unix_events.py", line 582, in _close
        self._loop.call_soon(self._call_connection_lost, exc)
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
        self._check_closed()
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
        raise RuntimeError('Event loop is closed')
    RuntimeError: Event loop is closed
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    WARNING:asyncio:Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 419211 is closed
    Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7f7dffc162a0>
    Traceback (most recent call last):
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_subprocess.py", line 126, in __del__
        self.close()
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_subprocess.py", line 104, in close
        proto.pipe.close()
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/unix_events.py", line 558, in close
        self._close(None)
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/unix_events.py", line 582, in _close
        self._loop.call_soon(self._call_connection_lost, exc)
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
        self._check_closed()
      File "/home/gert/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
        raise RuntimeError('Event loop is closed')
    RuntimeError: Event loop is closed
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    DEBUG:asyncio:Using selector: EpollSelector
    Timeout on subprocess command, terminating.
    [...]

    Either removing proc.terminate() or replacing it with proc.kill() makes the traceback disappear. 🤷🏼

    Also notice the (unrelated) error:

    WARNING:asyncio:Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 419211 is closed
    

    Further relevant system info: Python 3.11.2 (main, Mar 27 2023, 01:01:40) [GCC 12.2.1 20230201], built using Pyenv on Linux 6.2.9 (Arch, x86_64).

    Am I doing something wrong here or hitting a corner case that's not tackled by this bugfix? Thanks! 🙏🏼

    @kumaraditya303
    Copy link
    Contributor

    kumaraditya303 commented Apr 3, 2023

    You are missing await proc.wait() after terminating the process, adding that fixes it. Anyways this is not related to this fix so create a new issue if you still face the issue.

    @gertvdijk
    Copy link

    gertvdijk commented Apr 3, 2023

    You are missing await proc.wait() after terminating the process, adding that fixes it.

    Are you sure I should? It's also not included in Here’s an example of how asyncio can run a shell command and obtain its result, here: https://docs.python.org/3/library/asyncio-subprocess.html#

    Note that I'm already awaiting the process result with communicate(). For wait() this comment (here) seems to indicate that as well I shouldn't:

    Use the communicate() method when using pipes to avoid this condition.

    Anyways this is not related to this fix so create a new issue if you still face the issue.

    Sorry for the noise then, but to me it sounds exactly the same still.

    @kumaraditya303
    Copy link
    Contributor

    You are wrapping proc.communicate() in wait_for so it gets cancelled before it had to chance to call self.wait(). See the source code. The docs are for directly awaiting the proc.communicate() which doesn't work in case of wait_for.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes 3.12 bugs and security fixes topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    No branches or pull requests

    5 participants