Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid data for collections from a frame indexed in the same line in python #642

Open
jmcarcell opened this issue Jul 15, 2024 · 3 comments

Comments

@jmcarcell
Copy link
Member

jmcarcell commented Jul 15, 2024

The issue is the following, when indexing a frame iterator and using it in the same line:

import podio
podio_reader = podio.root_io.Reader("example_frame.root")
frames = podio_reader.get("events")
print(len(frames[0].get('mcparticles')))  # prints a random number like 811654108391827

the number printed is consistent with reading some random memory (64 bits). Or if the number is too big, the error will be:

OverflowError: cannot fit 'int' into an index-sized integer

The workaround is easy, just save the intermediate frame (explaining why this has not been before since typically one will loop over the frames):

ev = frames[0]
print(len(ev.get('mcparticles')))  # prints 10, all is good here

From this it just seems that frames[0] is not constructed properly in python. Any other operation that I have tried will also fail.

It seems not to be an issue with podio (leaving only cppyy), at least not with the python code, because if we get down to the functions that are actually being called and do:

print(podio.Frame(cppyy.gbl.std.move(podio_reader._reader.readEntry("events", 0))).get('mcparticles').size())

this will also print a random number. I'll try to prepare an example to report this to cppyy if that is the case (looks like).

@tmadlener
Copy link
Collaborator

Does this work on the c++ side? In principle it should as everything should live until the end of the statement. But in this case we create a temporary frame, from which we get a const & to a collection on which we then call size, so I am not entirely sure.

@jmcarcell
Copy link
Member Author

Yes, on C++ there aren't any surprises and you can do everything in one line. But here frames[0] not being alive before the line ends seems unexpected.

@wlav
Copy link

wlav commented Jul 15, 2024

C++ and Python have different rules for temporaries: C++ guarantees that the lifetime of temporaries is at least until the end of the statement; in Python, temporaries go away whenever the reference count returns to zero, which may be in the middle of a long statement. If a C++ method returns something by reference to data that lives in a temporary object (that came to be from a different function call, say, in the long statement), then that data may go away before the end of life of the reference.

cppyy tries to set "lifelines" from such references onto the object that holds the data by checking whether the reference points to somewhere within the object on which the method was called. This automatic system would fail if the reference is to something that is not part of the temporary, but is still controlled by that temporary. E.g. if the reference points into data that is held by pointer by the object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants