Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770

belm0 · 2020-10-26T00:20:40Z

symptom

at a regular period (e.g. 10s or 20s), there are significant stalls (e.g. 30 ms) where it appears the main thread is pre-empted

In terms of trio.abc.Instrument, the lost time appears between before_task_step and after_task_step, though tasks are doing nothing with threads or time.sleep(). In a single-threaded app with many heterogeneous tasks, the stall events appear to be distributed randomly among the tasks.

I've come up with a minimal program to reproduce the effect. Note that it is CPU-intensive (a loop of very small trio.sleep()), as CPU use > 70% seems necessary to induce the problem.

import numba
import trio

async def _run():
    T = .000015
    t_start = trio.current_time()

    while True:
        t0 = trio.current_time()
        await trio.sleep(T)
        delta = trio.current_time() - t0
        if delta > .005:
            elapsed = trio.current_time() - t_start
            print(f'{(elapsed) // 60:02.0f}:{elapsed % 60:02.0f}  {delta * 1000:5.1f}')

trio.run(_run)

Note that the repro imports numba, and that is the spooky part. With the program as written, observed output (timestamp, stall in ms):

$  python repro.py
00:00   29.1
00:11   35.5
00:21   33.4
00:32   33.8

However, removing the numba import, the stall durations are significantly reduced (6 ms) and at a different period (3 ms):

$  ./.venv/bin/python repro.py
00:00    5.2
00:03    6.5
00:05    6.7
00:08    6.4
00:10    6.1

And here is importing numpy instead of numba (10 ms stalls, 4ms period):

$  ./.venv/bin/python repro.py
00:00    8.9
00:04   11.8
00:08   10.5
00:11   10.1
00:15   11.2

conditions

seen in Trio 0.17.0

it's observed by multiple people running the repro, across multiple architectures (at least Linux/amd64 and and OS X)

So far it's specific to Trio. I've tried equivalent loops using asyncio and time.sleep(), which do not exhibit the problem.

original bug filed to numba: numba/numba#6347

theories

T1. kernel scheduling artifact? - is this just an expected effect as a process approaches 100% CPU, as the OS/kernel need cycles to do internal work? But why doesn't the problem show up with asyncio or time.sleep() implementations? Why does the stall period depend on unused package imports?
T2. weird GIL interaction among main thread and one of Trio's background threads? - The GIL is essentially a scheduler, I've read.
...?

next steps

try earlier versions of Trio - result: no change (tested back to 0.11.0)
...

The text was updated successfully, but these errors were encountered:

belm0 · 2020-10-26T00:36:01Z

observation:

the "lost time" seems to be relatively constant in each of the import cases when looking a larger time scale:

# numba - about 33 ms lost per 10s
00:00   29.1
00:11   35.5
00:21   33.4
00:32   33.8

# no import - about 26 ms lost per 10s
00:00    5.2
00:03    6.5
00:05    6.7
00:08    6.4
00:10    6.1

# numpy - about 29 ms lost per 10s
00:00    8.9
00:04   11.8
00:08   10.5
00:11   10.1
00:15   11.2

smurfix · 2020-10-26T07:14:40Z

You might note that trio uses epoll for scheduling, which has a time granularity of 1ms and the kernel doesn't guarantee how quickly it'll wake you up afterwards; among other factors that depends on which tick resolution it was compiled with.

Thus depending on how much work you do (or the kernel thinks you're doing) sooner or later the math of rounding those timers cause Trio to trigger a 1-ms scheduling step at some point. Whatever the kernel does with that is up to the kernel; its scheduling decisions do depend on how often your process has stalled in the past, which means that if it had more work to do at startup (like, loading numba) it'll get treated differently.

If you need high-resolution timers, use a timerfd. There probably is a Python binding to it somewhere.

Data point: on my system the max delta is 5ms no matter what I tweak.

njsmith · 2020-10-26T07:31:17Z

He says the time is disappearing while tasks are stepping though, not when trio is blocking for io.

One obvious thing to rule out would be GC pauses. You'll want to check if calling gc.disable() affects anything.

smurfix · 2020-10-26T08:09:33Z

Huh. I totally missed that.

In my test gc.disable reduces the lag to a consistent max of 1.7ms, so yes that should be the culprit.

njsmith · 2020-10-26T08:43:47Z

Ah, I even have a theory about why import numba could affect GC times, and why you'd see the same "lost time" on average with and without import numba:

while “non-full” collections (i.e., collections of the young and middle generations) will always examine roughly the same number of objects (determined by the aforementioned thresholds) the cost of a full collection is proportional to the total number of long-lived objects, which is virtually unbounded [...] each full garbage collection is more and more costly as the number of objects grows, but we do fewer and fewer of them

https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation

If the full-collection GC is in fact the problem, then I guess there are a number of ways you could try tweaking it.

gc.freeze() will take all the objects currently live in your program, and disable the GC just for them. You could call this after you finish your imports and general setup code, assuming those are what's creating the long pauses.

gc.collect() lets you trigger a manual full collection; maybe there are moments when you know it's OK to introduce a bit of extra latency, and you can use this to move the GC pauses there.

You can use gc.disable to disable the cycle collector entirely, or something like gc.set_threshold(700, 10, 9999999) to just disable full collections. This would require being careful not to create cyclic garbage to avoid memory leaks, but maybe you can do that.

belm0 · 2020-10-26T12:33:11Z

thank you-- no idea how I made it this far being unaware of the Python GC

so it makes sense that this is easier to observe at high CPU use-- otherwise the full cycle might be scheduled when the process is idle

still a mystery: why I couldn't reproduce it on asyncio or time.sleep()

njsmith · 2020-10-26T14:08:42Z

@belm0 The GC only considers doing a full collection after 700 * 10 * 10 = 70,000 objects have been allocated without being deallocated.

Presumably your time.sleep loop wasn't doing any allocations. Possibly Trio creates more reference cycles internally than asyncio does, so it triggers the GC more often?

njsmith · 2020-10-26T14:17:02Z

Hmm, yeah, running your script with gc.set_debug(gc.DEBUG_COLLECTABLE) shows a ton of cyclic garbage. It probably wouldn't be too hard to get rid of that, if someone could track down where the cycles are being created. I guess gc.DEBUG_SAVEALL might help?

belm0 · 2020-10-28T11:50:22Z

I've measured a fairly consistent 35 garbage objects per trio.sleep() with DEBUG_SAVEALL.

import gc
import trio

gc.set_debug(gc.DEBUG_SAVEALL)
T = 5
N = 3000

async def _run():
    garbage_start = len(gc.garbage)
    for _ in range(N):
        await trio.sleep(T/N)
    garbage_delta = len(gc.garbage) - garbage_start
    print(f'garbage rate: {garbage_delta / T:.1f}')
    print(f'garbage per sleep: {garbage_delta / N:.1f}')

trio.run(_run)

# N = 3000
garbage rate: 21422.0
garbage per sleep: 35.7

# N = 30000
garbage rate: 214708.6
garbage per sleep: 35.8

belm0 · 2020-10-28T11:58:28Z

if i == 10:
    gc.collect()
    gc.set_debug(gc.DEBUG_SAVEALL)
    assert not gc.garbage
await trio.sleep(T/N)
if i == 10:
    gc.collect()
    print('\n'.join(repr(x) for x in gc.garbage))
    gc.set_debug(0)

here are the 36 garbage items generated by one sleep() call:

Error(Cancelled())
Cancelled()
<traceback object at 0x10dda84b0>
<traceback object at 0x10dda8460>
<frame at 0x10dda5b90, file '/.../site-packages/trio/_timeouts.py', line 56, code sleep_until>
<function sleep_forever.<locals>.<lambda> at 0x10dda3680>
<traceback object at 0x10dda8410>
<frame at 0x10dda4c20, file '/.../site-packages/trio/_timeouts.py', line 40, code sleep_forever>
<traceback object at 0x10dda8370>
<frame at 0x10dda4dd0, file '/.../site-packages/trio/_core/_traps.py', line 166, code wait_task_rescheduled>
<traceback object at 0x10dda8320>
<frame at 0x10ddb0050, file '/.../site-packages/outcome/_sync.py', line 111, code unwrap>
<frame at 0x10dda26d0, file '/.../site-packages/trio/_core/_run.py', line 1167, code raise_cancel>
<frame at 0x10dda7250, file '/.../site-packages/outcome/_async.py', line 23, code capture>
<frame at 0x10dda6960, file '/.../site-packages/trio/_core/_run.py', line 1158, code _attempt_abort>
<function Task._attempt_delivery_of_any_pending_cancel.<locals>.raise_cancel at 0x10dda37a0>
<frame at 0x10dd9e910, file '/.../site-packages/trio/_core/_run.py', line 1169, code _attempt_delivery_of_any_pending_cancel>
<frame at 0x10dda6790, file '/.../site-packages/trio/_core/_run.py', line 358, code recalculate>
<frame at 0x10ddaa050, file '/.../site-packages/trio/_core/_run.py', line 681, code cancel>
CancelStatus(_scope=<trio.CancelScope at 0x10dda0590, exited, cancelled>, effectively_cancelled=True)
[]
<frame at 0x10dda65c0, file '/.../site-packages/trio/_core/_ki.py', line 159, code wrapper>
<trio.CancelScope at 0x10dda0590, exited, cancelled>
set()
set()
<frame at 0x10dda5d70, file '/.../site-packages/trio/_core/_run.py', line 197, code expire>
{'args': (<trio.CancelScope at 0x10dda0590, exited, cancelled>,), 'kwargs': {}, 'enabled': True, 'fn': <function CancelScope.cancel at 0x10db35710>, '@TRIO_KI_PROTECTION_ENABLED': True}
(<trio.CancelScope at 0x10dda0590, exited, cancelled>,)
<bound method CancelScope._exc_filter of <trio.CancelScope at 0x10dd3f750, exited, cancelled>>
<cell at 0x10cef3210: function object at 0x10dd7d950>
<cell at 0x10cef3150: method object at 0x10db29500>
<cell at 0x10cef3250: function object at 0x10dda3710>
(<cell at 0x10cef3210: function object at 0x10dd7d950>, <cell at 0x10cef3150: method object at 0x10db29500>)
<function _filter_impl.<locals>.filter_tree at 0x10dd7d950>
(<cell at 0x10cef3250: function object at 0x10dda3710>,)
<function _filter_impl.<locals>.push_tb_down at 0x10dda3710>

njsmith · 2020-10-28T12:42:31Z

right, so the next step is to poke around in that object graph and figure out where these objects are coming from and where the cycles are.

…

On Wed, Oct 28, 2020 at 4:58 AM John Belmonte ***@***.***> wrote: here are the 36 garbage items generated by one sleep(0) call: Error(Cancelled()) Cancelled() <traceback object at 0x10dda84b0> <traceback object at 0x10dda8460> <frame at 0x10dda5b90, file '/.../site-packages/trio/_timeouts.py', line 56, code sleep_until> <function sleep_forever.<locals>.<lambda> at 0x10dda3680> <traceback object at 0x10dda8410> <frame at 0x10dda4c20, file '/.../site-packages/trio/_timeouts.py', line 40, code sleep_forever> <traceback object at 0x10dda8370> <frame at 0x10dda4dd0, file '/.../site-packages/trio/_core/_traps.py', line 166, code wait_task_rescheduled> <traceback object at 0x10dda8320> <frame at 0x10ddb0050, file '/.../site-packages/outcome/_sync.py', line 111, code unwrap> <frame at 0x10dda26d0, file '/.../site-packages/trio/_core/_run.py', line 1167, code raise_cancel> <frame at 0x10dda7250, file '/.../site-packages/outcome/_async.py', line 23, code capture> <frame at 0x10dda6960, file '/.../site-packages/trio/_core/_run.py', line 1158, code _attempt_abort> <function Task._attempt_delivery_of_any_pending_cancel.<locals>.raise_cancel at 0x10dda37a0> <frame at 0x10dd9e910, file '/.../site-packages/trio/_core/_run.py', line 1169, code _attempt_delivery_of_any_pending_cancel> <frame at 0x10dda6790, file '/.../site-packages/trio/_core/_run.py', line 358, code recalculate> <frame at 0x10ddaa050, file '/.../site-packages/trio/_core/_run.py', line 681, code cancel> CancelStatus(_scope=<trio.CancelScope at 0x10dda0590, exited, cancelled>, effectively_cancelled=True) [] <frame at 0x10dda65c0, file '/.../site-packages/trio/_core/_ki.py', line 159, code wrapper> <trio.CancelScope at 0x10dda0590, exited, cancelled> set() set() <frame at 0x10dda5d70, file '/.../site-packages/trio/_core/_run.py', line 197, code expire> {'args': (<trio.CancelScope at 0x10dda0590, exited, cancelled>,), 'kwargs': {}, 'enabled': True, 'fn': <function CancelScope.cancel at 0x10db35710>, ***@***.***_KI_PROTECTION_ENABLED': True} (<trio.CancelScope at 0x10dda0590, exited, cancelled>,) <bound method CancelScope._exc_filter of <trio.CancelScope at 0x10dd3f750, exited, cancelled>> <cell at 0x10cef3210: function object at 0x10dd7d950> <cell at 0x10cef3150: method object at 0x10db29500> <cell at 0x10cef3250: function object at 0x10dda3710> (<cell at 0x10cef3210: function object at 0x10dd7d950>, <cell at 0x10cef3150: method object at 0x10db29500>) <function _filter_impl.<locals>.filter_tree at 0x10dd7d950> (<cell at 0x10cef3250: function object at 0x10dda3710>,) <function _filter_impl.<locals>.push_tb_down at 0x10dda3710> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1770 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEU42BV42UM3QHJRQW3GK3SNABPFANCNFSM4S6WJOYA> .

-- Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>

This avoids invoking the cycle collector as often; see python-trio/trio#1770 for more details.

When put together, this change + python-trio/outcome#29 should fix python-trio#1770

This avoids invoking the cycle collector as often; see python-trio/trio#1770 for more details.

belm0 · 2020-11-19T09:01:06Z

copying new garbage cases from the PR discussion:

these are with existing Trio and outcome fixes applied

cancelled background task: 40 items

async with trio.open_nursery() as nursery:
    nursery.start_soon(trio.sleep_forever)
    await trio.sleep(T/N)   # a little less garbage if this is removed
    nursery.cancel_scope.cancel()

cancelled empty nursery: 25 items

async with trio.open_nursery() as nursery:
    nursery.cancel_scope.cancel()

When put together, this change + python-trio/outcome#29 should fix python-trio#1770

njsmith changed the title ~~trio app stalls (spooky action)~~ Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython Oct 26, 2020

richardsheridan mentioned this issue Nov 7, 2020

Cancellation creates long-lived references to cyclic references to tasks and their local names #1793

Closed

njsmith added a commit to njsmith/outcome that referenced this issue Nov 16, 2020

Avoid creating a reference cycle when calling Error.unwrap

103eafa

This avoids invoking the cycle collector as often; see python-trio/trio#1770 for more details.

njsmith mentioned this issue Nov 16, 2020

Avoid creating a reference cycle when calling Error.unwrap python-trio/outcome#29

Merged

njsmith added a commit to njsmith/trio that referenced this issue Nov 16, 2020

Avoid creating reference cycles in MultiError.filter

42f42c3

When put together, this change + python-trio/outcome#29 should fix python-trio#1770

njsmith mentioned this issue Nov 16, 2020

Avoid creating reference cycles #1805

Merged

njsmith added a commit to njsmith/outcome that referenced this issue Nov 16, 2020

Avoid creating a reference cycle when calling Error.unwrap

5977c62

This avoids invoking the cycle collector as often; see python-trio/trio#1770 for more details.

njsmith added a commit to njsmith/outcome that referenced this issue Nov 16, 2020

Avoid creating a reference cycle when calling Error.unwrap

85502c5

This avoids invoking the cycle collector as often; see python-trio/trio#1770 for more details.

belm0 pushed a commit to njsmith/trio that referenced this issue Jan 8, 2021

Avoid creating reference cycles in MultiError.filter

257e5ba

When put together, this change + python-trio/outcome#29 should fix python-trio#1770

belm0 closed this as completed in #1805 Jan 8, 2021

richardsheridan mentioned this issue Jan 12, 2021

make sure coroutine locals are promptly destroyed #1864

Merged

dhirschfeld mentioned this issue Jul 2, 2021

Are reference cycles a performance problem? dask/distributed#4987

Open

A5rocks mentioned this issue Apr 10, 2023

Strange reference cycle (?) introduced with shim frames python/cpython#100964

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770

Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770

belm0 commented Oct 26, 2020 •

edited

Loading

belm0 commented Oct 26, 2020 •

edited

Loading

smurfix commented Oct 26, 2020 •

edited

Loading

njsmith commented Oct 26, 2020

smurfix commented Oct 26, 2020 •

edited

Loading

njsmith commented Oct 26, 2020

belm0 commented Oct 26, 2020

njsmith commented Oct 26, 2020

njsmith commented Oct 26, 2020

belm0 commented Oct 28, 2020 •

edited

Loading

belm0 commented Oct 28, 2020 •

edited

Loading

njsmith commented Oct 28, 2020 via email

belm0 commented Nov 19, 2020

Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770

Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770

Comments

belm0 commented Oct 26, 2020 • edited Loading

symptom

conditions

theories

next steps

belm0 commented Oct 26, 2020 • edited Loading

smurfix commented Oct 26, 2020 • edited Loading

njsmith commented Oct 26, 2020

smurfix commented Oct 26, 2020 • edited Loading

njsmith commented Oct 26, 2020

belm0 commented Oct 26, 2020

njsmith commented Oct 26, 2020

njsmith commented Oct 26, 2020

belm0 commented Oct 28, 2020 • edited Loading

belm0 commented Oct 28, 2020 • edited Loading

njsmith commented Oct 28, 2020 via email

belm0 commented Nov 19, 2020

cancelled background task: 40 items

cancelled empty nursery: 25 items

belm0 commented Oct 26, 2020 •

edited

Loading

belm0 commented Oct 26, 2020 •

edited

Loading

smurfix commented Oct 26, 2020 •

edited

Loading

smurfix commented Oct 26, 2020 •

edited

Loading

belm0 commented Oct 28, 2020 •

edited

Loading

belm0 commented Oct 28, 2020 •

edited

Loading