Improve GroupBy JIT error handling #13854

brandon-b-miller · 2023-08-11T17:03:28Z

This PR implements general error handling for features that are missing dtype overloads in JIT GroupBy. Before, if a feature was not supported for a certain dtype, a somewhat confusing numba error is raised. With this PR however, a clear error is raised, for instance in the case of missing float dtypes for the corr aggregation:

import pandas as pd
import cudf
df = pd.DataFrame({
    'a': [1, 1, 1, 2, 2, 2],
    'b': range(6),
    'c': range(6)
})
df['b'] = df['b'].astype('float64')
df['c'] = df['c'].astype('float64')


gdf = cudf.from_pandas(df, nan_as_null=False)
got = gdf.groupby('a').apply(lambda x: x['b'].corr(x['c']), engine='jit')

Now prints

Series.corr is not supported between float64 and float64 within JIT GroupBy apply.
Please file an issue requesting support for this feature at cuDF's GitHub page.

bdice · 2023-08-11T17:19:48Z

python/cudf/cudf/core/udf/groupby_typing.py

+            "apply.\nPlease file an issue "
+            "requesting support for this feature at cuDF's GitHub page."


Do we intend to support all possible inputs that can raise this error? It seems like we're asking for a lot of small issues and some of them might even be "wontfix."

I think this is a great question and worth discussion. From my point of view, the API is trying to convince the user as best as possible that their function is simply being run in a loop over groups, as in the iterative approach. In an ideal world then, every function we support in JIT groupby apply would support the same set of dtypes that the real Series.api supports. We have big holes in that right now, for example corr which is int only and the rest of the functions which don't support int16, int8 or any unsigned types yet. Each one of these missing overloads is a missing feature IMO and it will take some time to close the gap.

In raising as such I mainly intend to clarify what is going wrong, however in suggesting the issue I was aiming to create a feedback mechanism that might aid in our prioritization. However maybe as you suggest this is clumsy as we'd be waiting for users to file issues we already know exist.

What would you think of creating an umbrella issue that tracks the gap between supported dtypes and linking to it in the error instead, perhaps suggesting a github reply? We could track everything that's wontfix there. Or, if pointing to github seems unnecessary in its entirety I can remove it.

A single issue would be better! Maybe you can start it and populate it with known gaps. But generally it’s unusual for cudf to point to GitHub issues in other circumstances such as when calling unsupported keyword arguments from pandas.

@shwina What are your thoughts here?

I think perhaps pointing a single issue in the error message so folks can provide input as comments is maybe a good compromise? Also it would be a real clickable link in the error message, which is nice. Perhaps something along:

The <func>() function is not supported for the input types <...> by cuDF's JIT engine. To see what's currently supported or request support for a new function, go to https://github.com/rapidsai/cudf/issues/42

review-notebook-app · 2023-08-22T19:09:14Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

copy-pr-bot · 2023-08-31T16:06:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

vyasr

Looks pretty close, couple of small things left that are worth doing though.

python/cudf/cudf/core/udf/groupby_typing.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

brandon-b-miller · 2023-10-02T14:28:59Z

/ok to test

…ling

vyasr · 2023-10-09T18:17:13Z

@brandon-b-miller could you fix style here? Then we can finalize.

brandon-b-miller · 2023-10-09T18:24:12Z

/ok to test

brandon-b-miller · 2023-10-09T19:51:44Z

/ok to test

…ling

brandon-b-miller · 2023-11-30T19:45:08Z

/ok to test

…ling

brandon-b-miller · 2023-12-01T16:01:47Z

/ok to test

…ling

brandon-b-miller · 2023-12-06T16:21:59Z

/ok to test

brandon-b-miller · 2023-12-06T19:16:25Z

/ok to test

brandon-b-miller · 2023-12-06T21:11:26Z

I think this is ready to merge - any last thoughts anyone?

bdice

I suggested simplifying the error message text. Otherwise I don't have strong opinions on this PR and we should merge it to unblock further work on migration to Numba 0.58.

python/cudf/cudf/core/udf/groupby_typing.py

…ling

brandon-b-miller · 2023-12-12T14:30:26Z

/ok to test

brandon-b-miller · 2023-12-12T16:21:57Z

/merge

brandon-b-miller added 2 commits August 11, 2023 09:54

impl, add tests

25b3065

remove unused function

323b47b

brandon-b-miller added feature request New feature or request numba Numba issue Python Affects Python cuDF API. non-breaking Non-breaking change labels Aug 11, 2023

brandon-b-miller self-assigned this Aug 11, 2023

brandon-b-miller requested a review from a team as a code owner August 11, 2023 17:03

brandon-b-miller requested review from mroeschke and galipremsagar August 11, 2023 17:03

bdice reviewed Aug 11, 2023

View reviewed changes

brandon-b-miller added 4 commits August 15, 2023 20:42

begin filling out missing ops

5f1dcac

updates

959f36c

begin updating notebook

7fb40a4

notebook update, typingng progress

63c928e

brandon-b-miller mentioned this pull request Aug 18, 2023

Return nan when one variable to be correlated has zero variance in JIT GroupBy Apply #13884

Merged

brandon-b-miller added 4 commits August 22, 2023 07:02

progress with errors, tests

9e1e0a9

cleanup

42c09c8

fix error kind to be checked in corr test

094fbc2

senibly error for df level ops

bebf40a

brandon-b-miller added 6 commits August 22, 2023 12:09

minor cleanup

24caf72

more cleanup

d2656b4

merge latest and resolve conflicts

606ba87

switch to new style errors in numba config

89d376a

updates

5270627

continue combining classes

0173947

brandon-b-miller added 2 commits September 6, 2023 07:03

passing tests?

4cb2d89

renaming

bf85508

vyasr requested changes Sep 29, 2023

View reviewed changes

brandon-b-miller and others added 2 commits October 2, 2023 08:37

Update python/cudf/cudf/core/udf/groupby_typing.py

e71944b

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

address reviews

7604c2a

brandon-b-miller changed the base branch from branch-23.10 to branch-23.12 October 2, 2023 14:28

Merge branch 'branch-23.12' into improve-groupby-apply-jit-error-hand…

5dea9c9

…ling

vyasr approved these changes Oct 9, 2023

View reviewed changes

pass style checks

22f26d5

nb update

41ed51b

brandon-b-miller changed the base branch from branch-23.12 to branch-24.02 November 30, 2023 19:43

Merge branch 'branch-24.02' into improve-groupby-apply-jit-error-hand…

762635a

…ling

Merge branch 'branch-24.02' into improve-groupby-apply-jit-error-hand…

c599f08

…ling

brandon-b-miller added 2 commits December 6, 2023 07:43

Merge branch 'branch-24.02' into improve-groupby-apply-jit-error-hand…

4689f65

…ling

fix notebook

e870992

fix header levels

1b31157

brandon-b-miller requested a review from bdice December 6, 2023 21:10

bdice approved these changes Dec 9, 2023

View reviewed changes

python/cudf/cudf/core/udf/groupby_typing.py Outdated Show resolved Hide resolved

brandon-b-miller added 2 commits December 12, 2023 06:28

remove newlines

0aca0fb

Merge branch 'branch-24.02' into improve-groupby-apply-jit-error-hand…

2b5f8ba

…ling

rapids-bot bot merged commit 0fa80ec into rapidsai:branch-24.02 Dec 12, 2023
68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GroupBy JIT error handling #13854

Improve GroupBy JIT error handling #13854

brandon-b-miller commented Aug 11, 2023

bdice Aug 11, 2023

brandon-b-miller Aug 11, 2023 •

edited

Loading

bdice Aug 11, 2023

bdice Aug 11, 2023

shwina Aug 11, 2023 •

edited

Loading

review-notebook-app bot commented Aug 22, 2023

copy-pr-bot bot commented Aug 31, 2023

vyasr left a comment

brandon-b-miller commented Oct 2, 2023

vyasr commented Oct 9, 2023

brandon-b-miller commented Oct 9, 2023

brandon-b-miller commented Oct 9, 2023

brandon-b-miller commented Nov 30, 2023

brandon-b-miller commented Dec 1, 2023

brandon-b-miller commented Dec 6, 2023

brandon-b-miller commented Dec 6, 2023

brandon-b-miller commented Dec 6, 2023

bdice left a comment

brandon-b-miller commented Dec 12, 2023

brandon-b-miller commented Dec 12, 2023

		"apply.\nPlease file an issue "
		"requesting support for this feature at cuDF's GitHub page."

Improve GroupBy JIT error handling #13854

Improve GroupBy JIT error handling #13854

Conversation

brandon-b-miller commented Aug 11, 2023

bdice Aug 11, 2023

Choose a reason for hiding this comment

brandon-b-miller Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

bdice Aug 11, 2023

Choose a reason for hiding this comment

bdice Aug 11, 2023

Choose a reason for hiding this comment

shwina Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

review-notebook-app bot commented Aug 22, 2023

copy-pr-bot bot commented Aug 31, 2023

vyasr left a comment

Choose a reason for hiding this comment

brandon-b-miller commented Oct 2, 2023

vyasr commented Oct 9, 2023

brandon-b-miller commented Oct 9, 2023

brandon-b-miller commented Oct 9, 2023

brandon-b-miller commented Nov 30, 2023

brandon-b-miller commented Dec 1, 2023

brandon-b-miller commented Dec 6, 2023

brandon-b-miller commented Dec 6, 2023

brandon-b-miller commented Dec 6, 2023

bdice left a comment

Choose a reason for hiding this comment

brandon-b-miller commented Dec 12, 2023

brandon-b-miller commented Dec 12, 2023

brandon-b-miller Aug 11, 2023 •

edited

Loading

shwina Aug 11, 2023 •

edited

Loading