Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf #15011

vyasr · 2024-02-08T22:57:32Z

Description

Contributes to #13921

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence-

Looks great, thanks! Some minor suggestions

python/cudf/cudf/_lib/pylibcudf/concatenate.pyx

python/cudf/cudf/_lib/pylibcudf/lists.pyx

python/cudf/cudf/_lib/pylibcudf/sorting.pyx

python/cudf/cudf/_lib/pylibcudf/concatenate.pyx

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

This reverts commit b00da09.

vyasr · 2024-02-15T18:14:19Z

/merge

cwharris · 2024-02-20T19:46:40Z

Is there a reason get_column_names was removed? Is it replaced by anything? We were using it in Morpheus, but we can probably work around it being missing now. Just curious if there's another API we should use instead.

vyasr · 2024-02-23T01:35:02Z

It was removed because it was a completely unused function. It is not replaced by anything. If you need it, though, feel free to copy out the source. It was previously a cdef function, but there's really not much reason for it to be one because it was interacting with purely Python objects (DataFrame/Series/Index) except for the return value. Here's the old code:

cdef vector[string] get_column_names(object tbl, object index):
    cdef vector[string] column_names
    if index is not False:
        if isinstance(tbl._index, cudf.core.multiindex.MultiIndex):
            for idx_name in tbl._index.names:
                column_names.push_back(str.encode(idx_name))
        else:
            if tbl._index.name is not None:
                column_names.push_back(str.encode(tbl._index.name))

    for col_name in tbl._column_names:
        column_names.push_back(str.encode(col_name))

    return column_names

That should be replaceable with something like this (up to how you want to handle the truthiness of the index parameter)

def get_column_names(tbl, index):
    column_names = []
    if index:
        idx = tbl._index
        if isinstance(idx, MultiIndex):
            column_names = idx.names
        elif idx.name is not None:
            column_names = [idx.name]
    column_names.extend(tbl._column_names)
    return [c.encode() for c in column_names]

Then you can use that in a Cython context relying on Cython's automatic list->vector conversion

cdef vector[string] names = get_column_names(tbl, index)

vyasr added 3 commits February 8, 2024 22:56

Convert enum to enum class

330cdc0

Implement stream compaction in pylibcudf

2834227

Use pylibcudf for stream compaction

c0996c8

vyasr added feature request New feature or request non-breaking Non-breaking change labels Feb 8, 2024

vyasr self-assigned this Feb 8, 2024

vyasr requested a review from a team as a code owner February 8, 2024 22:57

vyasr requested review from isVoid and brandon-b-miller February 8, 2024 22:57

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue labels Feb 8, 2024

vyasr added 12 commits February 12, 2024 06:32

Implement sorting in pylibcudf

adab700

Switch to using pylibcudf for sorting

f6e274a

Add merge to pylibcudf

6e512f8

Use pylibcudf for merge

39bc47f

Remove unused concat_masks function

c6f85c7

Implement concatenate in pylibcudf

6239f7b

Use pylibcudf for concat

579d344

Remove some now unused functions

c827381

Fix function names

a92bb05

Implement lists explode in pylibcudf

ff1303b

Use pylibcudf for lists explode

02a8b09

Add missing docs

a76a427

vyasr changed the title ~~Implement stream compaction in pylibcudf~~ Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf Feb 13, 2024

Merge branch 'branch-24.04' into feat/pylibcudf_stream_compaction

f2b314d

wence- requested changes Feb 14, 2024

View reviewed changes

Expose some attributes for convenience

8a9efa9

vyasr commented Feb 14, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/concatenate.pyx Outdated Show resolved Hide resolved

vyasr commented Feb 14, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/concatenate.pyx Outdated Show resolved Hide resolved

Apply suggestions from code review

a57be21

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

vyasr added 4 commits February 14, 2024 22:08

A few more small fixes

98a9610

Make all sort functions encode stability in the signature

b00da09

Revert "Make all sort functions encode stability in the signature"

794d8d0

This reverts commit b00da09.

Don't shadow table

601c6ec

vyasr requested a review from wence- February 15, 2024 08:38

shwina mentioned this pull request Feb 15, 2024

[FEA] Remove inconsistencies in cython wrappers when handling order/null-precedence #14492

Open

wence- approved these changes Feb 15, 2024

View reviewed changes

rapids-bot bot merged commit 65d9c5e into rapidsai:branch-24.04 Feb 15, 2024
69 checks passed

vyasr deleted the feat/pylibcudf_stream_compaction branch February 15, 2024 18:14

vyasr mentioned this pull request Feb 27, 2024

[FEA] Implement all libcudf modules required by cuDF Python in pylibcudf #15162

Open

vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf #15011

Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf #15011

vyasr commented Feb 8, 2024

wence- left a comment

vyasr commented Feb 15, 2024

cwharris commented Feb 20, 2024 •

edited

Loading

vyasr commented Feb 23, 2024

Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf #15011

Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf #15011

Conversation

vyasr commented Feb 8, 2024

Description

Checklist

wence- left a comment

Choose a reason for hiding this comment

vyasr commented Feb 15, 2024

cwharris commented Feb 20, 2024 • edited Loading

vyasr commented Feb 23, 2024

cwharris commented Feb 20, 2024 •

edited

Loading