Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add uint64 support for some libgroupby funcs #28931

Merged
merged 3 commits into from
Oct 16, 2019

Conversation

jbrockmendel
Copy link
Member

cc @WillAyd I hope you agree the runtime_error thing here is easier to implement/review with fused type than it would be with tempita.

@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

Can you add tests for these as well? Would be nice to have something parametrized in groupby

@WillAyd WillAyd added Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Oct 11, 2019
@jbrockmendel
Copy link
Member Author

Can you add tests for these as well? Would be nice to have something parametrized in groupby

I'll take a look and see if we have tests that get at these functions directly (as opposed to via df.groupby.whatever). I can attest that these are hit by the python code for which this adds a comment.

@jbrockmendel
Copy link
Member Author

Following this and #28934 one more obvious cleanup is to move all of groupby_helper into groupby.pyx, since we are no longer using the templating

@jbrockmendel
Copy link
Member Author

@WillAyd it looks like we don't have any dedicated testing that gets directly a the functions in this file.

else:
out[i, j] = NAN
else:
out[i, j] = resx[i, j]

if runtime_error:
# We cannot raise directly above because that is within a nogil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these caught?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would get caught by one of the except Exceptions in the groupby code that i'm trying to make more specific.

@jreback jreback added this to the 1.0 milestone Oct 12, 2019
@jbrockmendel
Copy link
Member Author

Any major concerns? This is a blocker for getting at the except Exceptions in/around _cython_agg_blocks

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add to the parametrization of this test here:

def test_groupby_non_arithmetic_agg_types(dtype, method, data):

@@ -439,6 +466,9 @@ def group_max(groupby_t[:, :] out,
# Note: evaluated at compile-time
maxx[:] = -_int64_max
nan_val = NPY_NAT
elif groupby_t is uint64_t:
maxx[:] = 0
nan_val = 0 # We better not need this! TODO: Can we use e.g. NULL?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what this is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to define a value to treat as NA, and it has to be of the appropriate dtype. But for uint64 there is no such value, so defining it here is just as a placeholder.

I'll clarify the comment in the code.

@jbrockmendel
Copy link
Member Author

I think you can add to the parametrization of this test here:

@WillAyd added uint64 to this parametrization.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@WillAyd
Copy link
Member

WillAyd commented Oct 15, 2019

I think a whatsnew for function support would be nice as well

@jbrockmendel
Copy link
Member Author

I think a whatsnew for function support would be nice as well

Only user-facing effect should be perf, which I haven't measured yet. Would prefer to do measurements in one thorough pass after the last except Exception is removed around cython_agg_general (this plus two PRs)

@@ -378,7 +378,7 @@ def test_median_empty_bins(observed):


@pytest.mark.parametrize(
"dtype", ["int8", "int16", "int32", "int64", "float32", "float64"]
"dtype", ["int8", "int16", "int32", "int64", "float32", "float64", "uint64"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use any_real_type fixture here (can be followup)

@jreback jreback merged commit a0d01b8 into pandas-dev:master Oct 16, 2019
@jreback
Copy link
Contributor

jreback commented Oct 16, 2019

thanks

@jbrockmendel jbrockmendel deleted the faster41 branch October 16, 2019 15:25
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants