-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pandas support #684
Add pandas support #684
Conversation
0fa2a51
to
29296db
Compare
I don't understand why this is showing a drop in coverage/how to fix the coverage if there actually is a drop |
Is it because more lines have been added, but only tested in the pandas job, so the non pandas job coverages have fallen? That's what I get from comparing the most recent build https://coveralls.io/builds/18732747 to an older build https://coveralls.io/builds/18028346 |
Nice, found a solution. |
Good work guys! Yes, coverage is a ratio of (tested lines)/(total lines). That goes down if you write lines of code without adding tests. |
@hgrecco any possibility you could let us know whether this is even a feasible merge or whether we should start building a separate repo please? |
Thanks, I think we found a solution in the end. Basically making sure that coverage only considers the pandas stuff if the pandas integration is being tested. |
This is a long awaited feature. So I am eager to merge it , In general, I like the code and the API. But It is long PR and I need a extra couple of days to play with it. |
Great thanks. Take as long as you need! For our research group, if we have a rough timeline then we can work around it (trying to work with no information is always the most difficult) |
Will probably want to include some docs. The example ipynb file is good, but folks who read the readthedocs page might not know to look for it on the Github repo. |
Yep sounds good! Neither @andrewgsavage nor I could get our heads around how the docs work, do you have a resource you can point to/can you explain where the best place to add them is? |
To add to the current documentation structure I suggest to create a new file |
Cheers, have had a crack. I don't really understand how the doctests work so they may blow up... Some other things are also not that nice but hopefully with some guidance we can turn this into a nice pull request. @andrewgsavage jump in with any additions you want! |
Nice, docs! There's a few parts I should probably explain better. Will do so when I have a chance. Have you tried using df.to_html? That might display the dfs like jupyter does. Noticed that the dfs are now showing with the unit in the column. I think this is because of a change made to array without changing _format_values. I'm not sure which I prefer - it's nice to see the units but that reduces the space available for columns/is distracting. |
We are working on upgrading pint to be compatible with pandas, see hgrecco/pint#684 I am guessing that the line in the docs, 'If you’re building a library that implements the interface, please publicize it on Extension Data Types.', meant something like this pull request. If that's completely wrong, apologies.
Good job with the docs! I think I’m partial to showing units in the column header. Perhaps on a line below in parentheses (maybe printing using a beeline character). That keeps the data frame horizontally compact. I think there are pros and cons to wherever you put the units though. |
Yep I thought about that. I didn't know what this 'doctest' stuff was doing though so got nervous and didn't...
This sounds super nice. If we follow that convention then we force users to only have a single unit for a given column. My only wondering was whether we want that to be the case (I think it's the most sensible, but it's not the only option). We could, for example, allow a column to have different units within the column (e.g. the first value could be in metres, the second in centimetres, the third in millimetres etc.). That may quickly become a nightmare though... |
I agree that the docs are very nice. A few suggestions regarding the docs, but also the API:
|
Same, but I don't believe this is possible. The .dequantify() adds an extra row to the columns which was the closet I could get to. Will need to look into the formatting again, as it might be possible to show the units in the first row of data only, eg.
Data is stored as a pint quantity, which handles the unit converting and such. When you set a value in the PintArray to a different unit to the underlying quantity (in ._data), the value will be converted to the original unit. |
Awesome |
It might require redefining |
What do you mean by the default formatter? Pandas uses a separate formatter that we pass text/values into, _formatting_values. |
Hmm, if you're questioning that it I take it isn't obvious, maybe this needs an example/reference to the pint tutorial? |
Reference to the pint tutorial I think, I've never done it (can I admit that now..?) |
Latest commit should fix this I believe |
are you ok with having the test file in |
I'm assuming you're referring to pandas' repr/._to_html. These aren't things ExtensionArrays can change so I can't see any way of going about it. It might be worth submitting an issue to pandas to allow ExtensionArrays to display metadata in column headers. |
ok I think I've done all the steps we wanted now. The last big problem is that in the process of rebasing, I seem to have removed any mention of @andrewgsavage's contribution. I'm not very comfortable with this as all of the interface code is his (I've basically only done the test code). Do you have a suggestion for how to fix this whilst also doing such a mega rebase?
I don't think that's possible either... |
@hgrecco this is good from our end (tests pending) |
bors r+ |
684: Add pandas support r=hgrecco a=znicholls This pull request adds pandas support to pint (hence is related to #645, #401 and pandas-dev/pandas#10349). An example can be seen in `example-notebooks/basic-example.ipynb`. It's a little bit hacksih, feedback would be greatly appreciated by me and @andrewgsavage. One obvious example is that we have to run all the interface tests with `pytest` to fit with `pandas` test suite, which introduces a dependency for the CI and currently gives us this awkward testing setup (see the alterations we had to make to `testsuite`). This also means that our code coverage tests are fiddly too. If you'd like us to squash the commits, that can be done. If pint has a linter, it would be good to run that over this pull request too as we're a little bit all over the place re style. Things to discuss: - [x] general feedback and changes - [x] test setup, especially need for pytest for pandas tests and hackish way to get around automatic discovery - [x] squashing/rebasing - [x] linting/other code style (related to #664 and #628: we're happy with whatever, I've found using an automatic linter e.g. black and/or flake8 has made things much simpler in other projects) - [x] including notebooks in the repo (if we want to, I'm happy to put them under CI so we can make sure they run) - [x] setting up the docs correctly Co-authored-by: Zebedee Nicholls <zebedee.nicholls@climate-energy-college.org> Co-authored-by: andrewgsavage <andrewgsavage@gmail.com>
I have asked bors to merge it. Big thanks and congratulations for such wonderful work and thanks everybody else that helped by reviewing and commenting on the code. I think that now we deserve a new release |
Build succeeded |
Done! I have opened #693 to track the new release |
We are working on upgrading pint to be compatible with pandas, see hgrecco/pint#684 I am guessing that the line in the docs, 'If you’re building a library that implements the interface, please publicize it on Extension Data Types.', meant something like this pull request. If that's completely wrong, apologies.
Would you all be OK with adding |
Sure. Please add something like beta or prerelease or provisional. I think we need to test it better on our side via a broader use. |
I think we have this here pandas-dev/pandas#22582 so maybe just need to re-open and update? |
Yes, that'd be perfect if you could.
…On Mon, Jan 14, 2019 at 11:00 AM Zeb Nicholls ***@***.***> wrote:
I think we have this here pandas-dev/pandas#22582
<pandas-dev/pandas#22582> so maybe just need to
re-open and update?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#684 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIgKIcMF9CeGhb5geQAsHwMVKEEOuks5vDLe9gaJpZM4WQSf_>
.
|
@hgrecco @andrewgsavage to be clear, Pint no longer has the pandas support and we're moving it into https://github.com/hgrecco/pint-pandas ? |
True. But it would be an extension of pint. As soon as we think that is stable we can add it to pint/setup.py as optional. |
Ok great. I'll update pandas-dev/pandas#22582 later today |
This pull request adds pandas support to pint (hence is related to #645, #401 and pandas-dev/pandas#10349).
An example can be seen in
example-notebooks/basic-example.ipynb
.It's a little bit hacksih, feedback would be greatly appreciated by me and @andrewgsavage. One obvious example is that we have to run all the interface tests with
pytest
to fit withpandas
test suite, which introduces a dependency for the CI and currently gives us this awkward testing setup (see the alterations we had to make totestsuite
). This also means that our code coverage tests are fiddly too.If you'd like us to squash the commits, that can be done.
If pint has a linter, it would be good to run that over this pull request too as we're a little bit all over the place re style.
Things to discuss: