Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange reference cycle (?) introduced with shim frames #100964

Open
A5rocks opened this issue Jan 12, 2023 · 16 comments
Open

Strange reference cycle (?) introduced with shim frames #100964

A5rocks opened this issue Jan 12, 2023 · 16 comments
Assignees
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@A5rocks
Copy link

A5rocks commented Jan 12, 2023

Bug report

python-trio/trio#2514 (comment)

import weakref
import trio
import gc

async def f():
    async with trio.open_nursery() as nursery:
        nursery.cancel_scope.cancel()

    h = weakref.ref(nursery)
    del nursery
    await trio.lowlevel.checkpoint()
    print(h(), gc.get_referrers(h()) if h() is not None else [])

trio.run(f)

(I haven't figured out how to reproduce without the import trio)

I bisected this to 1e197e6

Previously:

None []

Now:

<trio.Nursery object at 0x7f3f5585d4f0> [NurseryManager(strict_exception_groups=False), <cell at 0x7f3f5585d540: Nursery object at 0x7f3f5585d4f0>, <frame at 0x7f3f55e220c0, file '/workspaces/trio/trio/_core/_run.py', line 1005, code _nested_child_finished>]

I'm not certain this should be a bug, but raising because I'm confused as to the cause.

Linked PRs

@A5rocks A5rocks added the type-bug An unexpected behavior, bug, or error label Jan 12, 2023
@A5rocks A5rocks changed the title Strange reference cycle introduced with shim frames Strange reference cycle (?) introduced with shim frames Jan 12, 2023
@sobolevn
Copy link
Member

cc @markshannon

@kumaraditya303 kumaraditya303 added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.12 bugs and security fixes labels Jan 12, 2023
@brandtbucher
Copy link
Member

I believe that commit slightly changed how/when generators clear their frames upon raising or returning. I forget exactly how, though... I think it might be that the generator used to link, unlink, and clear its own frame in gen_send_ex2 but now all that work happens in the VM? Or something.

Seems like it could definitely be related.

@iritkatriel
Copy link
Member

If you add gc.collect() before the print you get "None []" in the output, so there's a cycle but no leak.

@A5rocks
Copy link
Author

A5rocks commented Apr 10, 2023

Yup, this issue is just about new cycles being created where there were none before.

The context for the failing testcase is a regression test for python-trio/trio#1770 (which kinda explains why we care about this in the first place).

@iritkatriel
Copy link
Member

You can add gc.collect() to the test.

@A5rocks
Copy link
Author

A5rocks commented Apr 10, 2023

Sorry, I think you might be misunderstanding me. In order to minimize stalls, in trio we've put in some extra work to remove reference cycles (to make GC run less often). This new shim frames mechanism adds some reference cycles which probably shouldn't be added -- the news message says "The extra frame should be invisible to all Python and most C extensions." Adding gc.collect() "fixes" this but that's not really the ideal solution for us; users probably shouldn't have to think about GC.

@iritkatriel
Copy link
Member

iritkatriel commented Apr 11, 2023

Ok, so I'll label it performance instead of bug.

@iritkatriel iritkatriel added performance Performance or resource usage and removed type-bug An unexpected behavior, bug, or error labels Apr 11, 2023
@markshannon
Copy link
Member

Note to self:
It is possible that we need to clear gi_exc_state.exc_value in clear_gen_frame()

@iritkatriel
Copy link
Member

Is this our cycle?

(Pdb) p ref
<built-in method send of coroutine object at 0x104bf6c30>
(Pdb) p gc.get_referrers(ref)[1]
<coroutine object f at 0x104bf6c30>
(Pdb) p gc.get_referrers(gc.get_referrers(ref)[1])[1]
<built-in method send of coroutine object at 0x104bf6c30>

@markshannon
Copy link
Member

@A5rocks #100964 may fix this issue. Can you test it for me?
I get an error trying to pip install trio with the latest CPython build.

@A5rocks
Copy link
Author

A5rocks commented Aug 3, 2023

Unfortunately that didn't seem to work:

a5@LAPTOP-MN15OEE9:/tmp/trio$ ../cpython/python test.py
<trio.Nursery object at 0x7fcac5d04a10> [NurseryManager(strict_exception_groups=False), <cell at 0x7fcac5d049d0: Nursery object at 0x7fcac5d04a10>, <frame at 0x7fcac5c65be0, file '/tmp/trio/trio/_core/_run.py', line 1050, code _nested_child_finished>]
a5@LAPTOP-MN15OEE9:/tmp/trio$ ../cpython/python --version
Python 3.13.0a0
a5@LAPTOP-MN15OEE9:/tmp/trio$ cd ../cpython ; git log | grep "GH-100964"
    GH-100964: Break cycles involving exception state when returning from generator (GH-107563)

(using WSL since atm import trio fails on windows on 3.12 at least on a cffi segfault :( )

(as for the previous comment about whether that's the cycle here, I don't know.)

@A5rocks
Copy link
Author

A5rocks commented Oct 24, 2023

This was fixed in trio but it turns out that fix was because python-trio/trio@a3d9d25 (specifically due to this change: python-trio/trio@a3d9d25#diff-568b66d9830d9516b524413eb041a1d4c8ccf0b76af4e10d96098888ce323235R937)

... I'll dedicate some more time to hopefully pinning this down because even though this isn't very important it still bothers me!

@A5rocks
Copy link
Author

A5rocks commented Oct 25, 2023

hi.

i've found a minimal reproduction that only relies on the standard library. this feels so cursed.

import weakref
import asyncio

async def preserve(a):
    try:
        raise Exception("aaa")
    except BaseException as exc:
        e = exc

    try:
        return e
    finally:
        del e


class Dummy:
    def __repr__(self):
        return "ahahahahahah its broken!!"

class PersonManager:
    async def __aenter__(self):
        self._person = Dummy()
        return self._person

    async def __aexit__(self, etype, exc, tb):
        new_exc = await preserve(self._person)
        return True

async def f():
    async with PersonManager() as me:
        pass

    h = weakref.ref(me)
    del me
    await asyncio.sleep(0.01)
    print(h())

asyncio.run(f())

sorry for taking so long.

@brandtbucher
Copy link
Member

Thanks, this is super helpful. Smaller reproducer that prints True on 3.11 and and False on 3.12:

import weakref
import asyncio

async def preserve():
    try:
        raise Exception()
    except Exception as exc:
        return exc

class PeopleManager:
    pass

def __aexit__(self):
    new_exc = asyncio.run(preserve())

def f():
    manager = PeopleManager()
    __aexit__(manager)
    h = weakref.ref(manager)
    del manager
    print(h() is None)

f()

@brandtbucher
Copy link
Member

brandtbucher commented Oct 27, 2023

New minimal reproducer, using a generator instead of asyncio:

import weakref

def g():
    try:
        raise Exception()
    except Exception as e:
        yield e

class C:
    pass

def f(o):
    r = next(g())

o = C()
f(o)
h = weakref.ref(o)
del o
print(h() is None)

The cycle is r is r.__traceback__.tb_frame.f_back.f_locals["r"]. This cycle keeps the locals, including o, alive.

On 3.11 and earlier, f_back is None. On 3.12, there is a full chain of frames. That's not great, since we're potentially keeping huge graphs of objects alive every time one of these exceptions is stashed away...

@brandtbucher
Copy link
Member

brandtbucher commented Oct 27, 2023

New minimal test:

def g():
    yield

generator = g()
frame = generator.gi_frame
# None on 3.11, None on 3.12:
print(f"Started:   {frame.f_back = }")
next(generator)
# None on 3.11, None on 3.12:
print(f"Suspended: {frame.f_back = }")
next(generator, None)
# None on 3.11, non-None on 3.12:
print(f"Exhausted: {frame.f_back = }")

So, something about transferring ownership of the _PyInterpreterFrame from the exhausted generator to the living frame object re-links it in 3.12.

@Eclips4 Eclips4 added the 3.13 bugs and security fixes label Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

7 participants