Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large strings support for cudf::interleave_columns #15544

Merged
merged 16 commits into from
May 3, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Updates the cudf::interleave_columns logic to use gather-based make_strings_column instead of the make_strings_children since the gather-based function already efficiently supports longs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 16, 2024
@davidwendt davidwendt self-assigned this Apr 16, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 23, 2024
@davidwendt davidwendt marked this pull request as ready for review April 23, 2024 19:35
@davidwendt davidwendt requested a review from a team as a code owner April 23, 2024 19:35
{
CUDF_FAIL("Called `interleave_list_entries_fn()` on non-supported types.");
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just moved from below.

Comment on lines +151 to +152
auto const source_col_idx = idx % num_columns;
auto const source_row_idx = idx / num_columns;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This access pattern makes me wonder if a kernel would be significantlty faster.
But I assume this is light-weigth either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean faster than thrust::transform?
The lambda here should be very fast since it only operates on the bitmask and the offsets in a very coalesced access pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I thought about this is because threads with adjacent indices access different columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. That is a good point.

Copy link
Contributor

@bdice bdice May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost made the same comment as @vuule but then I wondered if the point was that interleaving would have coalesced writes (not reads)? I didn’t look too closely at whether that was true but my intuition was that swapping these might be worthwhile. At least worth benchmarking.

Copy link
Contributor Author

@davidwendt davidwendt May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the order from coalesced write to coalesced read and wrote a benchmark with different number of columns. The performance did suffer 10% (for 2 columns) to 35% (for 100 columns).
This probably could be mitigated with some extra work to use shared-memory to minimize the non-coalesced writes.
But I think this kind of effort should also encompass the non-strings code paths as well (which also do coalesced writes). So I feel this may be a bit out of scope for this PR.
I will include the benchmark code in this PR since it has already been created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing. Thanks for measuring.

cpp/src/lists/interleave_columns.cu Show resolved Hide resolved
@github-actions github-actions bot added the CMake CMake build issue label May 2, 2024
@davidwendt davidwendt added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 3, 2024
@davidwendt davidwendt removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 3, 2024
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 09f8ff3 into rapidsai:branch-24.06 May 3, 2024
70 checks passed
@davidwendt davidwendt deleted the ls-interleave branch May 3, 2024 18:53
rapids-bot bot pushed a commit that referenced this pull request May 9, 2024
Adds a gtest for `cudf::interleave_columns` that tests it can produce large-strings appropriately.
Follow on to #15544

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - MithunR (https://github.com/mythrocks)

URL: #15669
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants