Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtypes being ignored with certain versions of pandas #16

Open
mcvicuna opened this issue Jun 1, 2018 · 3 comments
Open

dtypes being ignored with certain versions of pandas #16

mcvicuna opened this issue Jun 1, 2018 · 3 comments

Comments

@mcvicuna
Copy link

mcvicuna commented Jun 1, 2018

pandas had/has an open bug where dtypes passed into the read_csv call are ignored. pandas#9435

If you have a buggy version of pandas all the fields are marked as f4 and this makes it impossible to correctly use particle_id as an index as you endup with collisions due to truncation.
Ex: 369295375619592193 vs 369295375602810880

This silently fails and will lead to a lot of confusion.

@msmk0
Copy link
Collaborator

msmk0 commented Jun 18, 2018

Is there any way we can detect this from the pandas version alone?

@mcvicuna
Copy link
Author

AFAIK, it is still an open issue and my brief investigation showed no consensus.

I'm using Python 3.5.0, pandas, 0.17.0, numpy 1.9.3, on Win10.
I had to modifly _load_event_data() to not pass in dtypes in order to get full precision particle_id.

If you don't see this behavior I can try to make a pull for you.

@mcvicuna
Copy link
Author

After further digging, this is actually related to pandas#11617

slicing the entire table by rows using at/loc/iterrows coerces the dtypes of the sliced rows. 0.23.1 still exhibits this behavior.

You can see this by:
hits, cells, particles, truths = load_event(...)
particles.iloc[0].particle_id.dtype

Why it downcasts to float32 I don't know. but changing the other dtypes to f8/i8 as needed will cause everything to be cast to float64 which is a workaround but will only provide 53bits of precision for particle_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants