Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf #15011

Merged

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Feb 8, 2024

Description

Contributes to #13921

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added feature request New feature or request non-breaking Non-breaking change labels Feb 8, 2024
@vyasr vyasr self-assigned this Feb 8, 2024
@vyasr vyasr requested a review from a team as a code owner February 8, 2024 22:57
@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue labels Feb 8, 2024
@vyasr vyasr changed the title Implement stream compaction in pylibcudf Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf Feb 13, 2024
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Some minor suggestions

python/cudf/cudf/_lib/pylibcudf/concatenate.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/concatenate.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/concatenate.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/lists.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/sorting.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/sorting.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/sorting.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/sorting.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/sorting.pyx Show resolved Hide resolved
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
@vyasr
Copy link
Contributor Author

vyasr commented Feb 15, 2024

/merge

@rapids-bot rapids-bot bot merged commit 65d9c5e into rapidsai:branch-24.04 Feb 15, 2024
69 checks passed
@vyasr vyasr deleted the feat/pylibcudf_stream_compaction branch February 15, 2024 18:14
@cwharris
Copy link
Contributor

cwharris commented Feb 20, 2024

Is there a reason get_column_names was removed? Is it replaced by anything? We were using it in Morpheus, but we can probably work around it being missing now. Just curious if there's another API we should use instead.

@vyasr
Copy link
Contributor Author

vyasr commented Feb 23, 2024

It was removed because it was a completely unused function. It is not replaced by anything. If you need it, though, feel free to copy out the source. It was previously a cdef function, but there's really not much reason for it to be one because it was interacting with purely Python objects (DataFrame/Series/Index) except for the return value. Here's the old code:

cdef vector[string] get_column_names(object tbl, object index):
    cdef vector[string] column_names
    if index is not False:
        if isinstance(tbl._index, cudf.core.multiindex.MultiIndex):
            for idx_name in tbl._index.names:
                column_names.push_back(str.encode(idx_name))
        else:
            if tbl._index.name is not None:
                column_names.push_back(str.encode(tbl._index.name))

    for col_name in tbl._column_names:
        column_names.push_back(str.encode(col_name))

    return column_names

That should be replaceable with something like this (up to how you want to handle the truthiness of the index parameter)

def get_column_names(tbl, index):
    column_names = []
    if index:
        idx = tbl._index
        if isinstance(idx, MultiIndex):
            column_names = idx.names
        elif idx.name is not None:
            column_names = [idx.name]
    column_names.extend(tbl._column_names)
    return [c.encode() for c in column_names]

Then you can use that in a Cython context relying on Cython's automatic list->vector conversion

cdef vector[string] names = get_column_names(tbl, index)

@vyasr vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants