Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770

Closed
1 of 2 tasks
belm0 opened this issue Oct 26, 2020 · 12 comments · Fixed by #1805
Closed
1 of 2 tasks

Comments

@belm0
Copy link
Member

belm0 commented Oct 26, 2020

symptom

at a regular period (e.g. 10s or 20s), there are significant stalls (e.g. 30 ms) where it appears the main thread is pre-empted

In terms of trio.abc.Instrument, the lost time appears between before_task_step and after_task_step, though tasks are doing nothing with threads or time.sleep(). In a single-threaded app with many heterogeneous tasks, the stall events appear to be distributed randomly among the tasks.

I've come up with a minimal program to reproduce the effect. Note that it is CPU-intensive (a loop of very small trio.sleep()), as CPU use > 70% seems necessary to induce the problem.

import numba
import trio

async def _run():
    T = .000015
    t_start = trio.current_time()

    while True:
        t0 = trio.current_time()
        await trio.sleep(T)
        delta = trio.current_time() - t0
        if delta > .005:
            elapsed = trio.current_time() - t_start
            print(f'{(elapsed) // 60:02.0f}:{elapsed % 60:02.0f}  {delta * 1000:5.1f}')

trio.run(_run)

Note that the repro imports numba, and that is the spooky part. With the program as written, observed output (timestamp, stall in ms):

$  python repro.py
00:00   29.1
00:11   35.5
00:21   33.4
00:32   33.8

However, removing the numba import, the stall durations are significantly reduced (6 ms) and at a different period (3 ms):

$  ./.venv/bin/python repro.py
00:00    5.2
00:03    6.5
00:05    6.7
00:08    6.4
00:10    6.1

And here is importing numpy instead of numba (10 ms stalls, 4ms period):

$  ./.venv/bin/python repro.py
00:00    8.9
00:04   11.8
00:08   10.5
00:11   10.1
00:15   11.2

conditions

seen in Trio 0.17.0

it's observed by multiple people running the repro, across multiple architectures (at least Linux/amd64 and and OS X)

So far it's specific to Trio. I've tried equivalent loops using asyncio and time.sleep(), which do not exhibit the problem.

original bug filed to numba: numba/numba#6347

theories

  • T1. kernel scheduling artifact? - is this just an expected effect as a process approaches 100% CPU, as the OS/kernel need cycles to do internal work? But why doesn't the problem show up with asyncio or time.sleep() implementations? Why does the stall period depend on unused package imports?
  • T2. weird GIL interaction among main thread and one of Trio's background threads? - The GIL is essentially a scheduler, I've read.
  • ...?

next steps

  • try earlier versions of Trio - result: no change (tested back to 0.11.0)
  • ...
@belm0
Copy link
Member Author

belm0 commented Oct 26, 2020

observation:

the "lost time" seems to be relatively constant in each of the import cases when looking a larger time scale:

# numba - about 33 ms lost per 10s
00:00   29.1
00:11   35.5
00:21   33.4
00:32   33.8
# no import - about 26 ms lost per 10s
00:00    5.2
00:03    6.5
00:05    6.7
00:08    6.4
00:10    6.1
# numpy - about 29 ms lost per 10s
00:00    8.9
00:04   11.8
00:08   10.5
00:11   10.1
00:15   11.2

@smurfix
Copy link
Contributor

smurfix commented Oct 26, 2020

You might note that trio uses epoll for scheduling, which has a time granularity of 1ms and the kernel doesn't guarantee how quickly it'll wake you up afterwards; among other factors that depends on which tick resolution it was compiled with.

Thus depending on how much work you do (or the kernel thinks you're doing) sooner or later the math of rounding those timers cause Trio to trigger a 1-ms scheduling step at some point. Whatever the kernel does with that is up to the kernel; its scheduling decisions do depend on how often your process has stalled in the past, which means that if it had more work to do at startup (like, loading numba) it'll get treated differently.

If you need high-resolution timers, use a timerfd. There probably is a Python binding to it somewhere.

Data point: on my system the max delta is 5ms no matter what I tweak.

@njsmith
Copy link
Member

njsmith commented Oct 26, 2020

He says the time is disappearing while tasks are stepping though, not when trio is blocking for io.

One obvious thing to rule out would be GC pauses. You'll want to check if calling gc.disable() affects anything.

@smurfix
Copy link
Contributor

smurfix commented Oct 26, 2020

Huh. I totally missed that.

In my test gc.disable reduces the lag to a consistent max of 1.7ms, so yes that should be the culprit.

@njsmith
Copy link
Member

njsmith commented Oct 26, 2020

Ah, I even have a theory about why import numba could affect GC times, and why you'd see the same "lost time" on average with and without import numba:

while “non-full” collections (i.e., collections of the young and middle generations) will always examine roughly the same number of objects (determined by the aforementioned thresholds) the cost of a full collection is proportional to the total number of long-lived objects, which is virtually unbounded [...] each full garbage collection is more and more costly as the number of objects grows, but we do fewer and fewer of them

If the full-collection GC is in fact the problem, then I guess there are a number of ways you could try tweaking it.

gc.freeze() will take all the objects currently live in your program, and disable the GC just for them. You could call this after you finish your imports and general setup code, assuming those are what's creating the long pauses.

gc.collect() lets you trigger a manual full collection; maybe there are moments when you know it's OK to introduce a bit of extra latency, and you can use this to move the GC pauses there.

You can use gc.disable to disable the cycle collector entirely, or something like gc.set_threshold(700, 10, 9999999) to just disable full collections. This would require being careful not to create cyclic garbage to avoid memory leaks, but maybe you can do that.

@belm0
Copy link
Member Author

belm0 commented Oct 26, 2020

thank you-- no idea how I made it this far being unaware of the Python GC

so it makes sense that this is easier to observe at high CPU use-- otherwise the full cycle might be scheduled when the process is idle

still a mystery: why I couldn't reproduce it on asyncio or time.sleep()

@njsmith
Copy link
Member

njsmith commented Oct 26, 2020

@belm0 The GC only considers doing a full collection after 700 * 10 * 10 = 70,000 objects have been allocated without being deallocated.

Presumably your time.sleep loop wasn't doing any allocations. Possibly Trio creates more reference cycles internally than asyncio does, so it triggers the GC more often?

@njsmith
Copy link
Member

njsmith commented Oct 26, 2020

Hmm, yeah, running your script with gc.set_debug(gc.DEBUG_COLLECTABLE) shows a ton of cyclic garbage. It probably wouldn't be too hard to get rid of that, if someone could track down where the cycles are being created. I guess gc.DEBUG_SAVEALL might help?

@njsmith njsmith changed the title trio app stalls (spooky action) Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython Oct 26, 2020
@belm0
Copy link
Member Author

belm0 commented Oct 28, 2020

I've measured a fairly consistent 35 garbage objects per trio.sleep() with DEBUG_SAVEALL.

import gc
import trio

gc.set_debug(gc.DEBUG_SAVEALL)
T = 5
N = 3000

async def _run():
    garbage_start = len(gc.garbage)
    for _ in range(N):
        await trio.sleep(T/N)
    garbage_delta = len(gc.garbage) - garbage_start
    print(f'garbage rate: {garbage_delta / T:.1f}')
    print(f'garbage per sleep: {garbage_delta / N:.1f}')

trio.run(_run)
# N = 3000
garbage rate: 21422.0
garbage per sleep: 35.7

# N = 30000
garbage rate: 214708.6
garbage per sleep: 35.8

@belm0
Copy link
Member Author

belm0 commented Oct 28, 2020

if i == 10:
    gc.collect()
    gc.set_debug(gc.DEBUG_SAVEALL)
    assert not gc.garbage
await trio.sleep(T/N)
if i == 10:
    gc.collect()
    print('\n'.join(repr(x) for x in gc.garbage))
    gc.set_debug(0)

here are the 36 garbage items generated by one sleep() call:

Error(Cancelled())
Cancelled()
<traceback object at 0x10dda84b0>
<traceback object at 0x10dda8460>
<frame at 0x10dda5b90, file '/.../site-packages/trio/_timeouts.py', line 56, code sleep_until>
<function sleep_forever.<locals>.<lambda> at 0x10dda3680>
<traceback object at 0x10dda8410>
<frame at 0x10dda4c20, file '/.../site-packages/trio/_timeouts.py', line 40, code sleep_forever>
<traceback object at 0x10dda8370>
<frame at 0x10dda4dd0, file '/.../site-packages/trio/_core/_traps.py', line 166, code wait_task_rescheduled>
<traceback object at 0x10dda8320>
<frame at 0x10ddb0050, file '/.../site-packages/outcome/_sync.py', line 111, code unwrap>
<frame at 0x10dda26d0, file '/.../site-packages/trio/_core/_run.py', line 1167, code raise_cancel>
<frame at 0x10dda7250, file '/.../site-packages/outcome/_async.py', line 23, code capture>
<frame at 0x10dda6960, file '/.../site-packages/trio/_core/_run.py', line 1158, code _attempt_abort>
<function Task._attempt_delivery_of_any_pending_cancel.<locals>.raise_cancel at 0x10dda37a0>
<frame at 0x10dd9e910, file '/.../site-packages/trio/_core/_run.py', line 1169, code _attempt_delivery_of_any_pending_cancel>
<frame at 0x10dda6790, file '/.../site-packages/trio/_core/_run.py', line 358, code recalculate>
<frame at 0x10ddaa050, file '/.../site-packages/trio/_core/_run.py', line 681, code cancel>
CancelStatus(_scope=<trio.CancelScope at 0x10dda0590, exited, cancelled>, effectively_cancelled=True)
[]
<frame at 0x10dda65c0, file '/.../site-packages/trio/_core/_ki.py', line 159, code wrapper>
<trio.CancelScope at 0x10dda0590, exited, cancelled>
set()
set()
<frame at 0x10dda5d70, file '/.../site-packages/trio/_core/_run.py', line 197, code expire>
{'args': (<trio.CancelScope at 0x10dda0590, exited, cancelled>,), 'kwargs': {}, 'enabled': True, 'fn': <function CancelScope.cancel at 0x10db35710>, '@TRIO_KI_PROTECTION_ENABLED': True}
(<trio.CancelScope at 0x10dda0590, exited, cancelled>,)
<bound method CancelScope._exc_filter of <trio.CancelScope at 0x10dd3f750, exited, cancelled>>
<cell at 0x10cef3210: function object at 0x10dd7d950>
<cell at 0x10cef3150: method object at 0x10db29500>
<cell at 0x10cef3250: function object at 0x10dda3710>
(<cell at 0x10cef3210: function object at 0x10dd7d950>, <cell at 0x10cef3150: method object at 0x10db29500>)
<function _filter_impl.<locals>.filter_tree at 0x10dd7d950>
(<cell at 0x10cef3250: function object at 0x10dda3710>,)
<function _filter_impl.<locals>.push_tb_down at 0x10dda3710>

@njsmith
Copy link
Member

njsmith commented Oct 28, 2020 via email

njsmith added a commit to njsmith/outcome that referenced this issue Nov 16, 2020
This avoids invoking the cycle collector as often; see

  python-trio/trio#1770

for more details.
njsmith added a commit to njsmith/trio that referenced this issue Nov 16, 2020
njsmith added a commit to njsmith/outcome that referenced this issue Nov 16, 2020
This avoids invoking the cycle collector as often; see

  python-trio/trio#1770

for more details.
njsmith added a commit to njsmith/outcome that referenced this issue Nov 16, 2020
This avoids invoking the cycle collector as often; see

  python-trio/trio#1770

for more details.
@belm0
Copy link
Member Author

belm0 commented Nov 19, 2020

copying new garbage cases from the PR discussion:

these are with existing Trio and outcome fixes applied

cancelled background task: 40 items

async with trio.open_nursery() as nursery:
    nursery.start_soon(trio.sleep_forever)
    await trio.sleep(T/N)   # a little less garbage if this is removed
    nursery.cancel_scope.cancel()

cancelled empty nursery: 25 items

async with trio.open_nursery() as nursery:
    nursery.cancel_scope.cancel()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants