Large strings support for cudf::interleave_columns #15544

davidwendt · 2024-04-16T21:05:34Z

Description

Updates the cudf::interleave_columns logic to use gather-based make_strings_column instead of the make_strings_children since the gather-based function already efficiently supports longs.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

davidwendt · 2024-04-23T19:36:21Z

cpp/src/lists/interleave_columns.cu

+  {
+    CUDF_FAIL("Called `interleave_list_entries_fn()` on non-supported types.");
+  }
+};


This was just moved from below.

vuule · 2024-04-29T19:07:07Z

cpp/src/reshape/interleave_columns.cu

+    auto const source_col_idx = idx % num_columns;
+    auto const source_row_idx = idx / num_columns;


This access pattern makes me wonder if a kernel would be significantlty faster.
But I assume this is light-weigth either way.

Do you mean faster than thrust::transform?
The lambda here should be very fast since it only operates on the bitmask and the offsets in a very coalesced access pattern.

the reason I thought about this is because threads with adjacent indices access different columns.

Ah yes. That is a good point.

I almost made the same comment as @vuule but then I wondered if the point was that interleaving would have coalesced writes (not reads)? I didn’t look too closely at whether that was true but my intuition was that swapping these might be worthwhile. At least worth benchmarking.

I switched the order from coalesced write to coalesced read and wrote a benchmark with different number of columns. The performance did suffer 10% (for 2 columns) to 35% (for 100 columns).
This probably could be mitigated with some extra work to use shared-memory to minimize the non-coalesced writes.
But I think this kind of effort should also encompass the non-strings code paths as well (which also do coalesced writes). So I feel this may be a bit out of scope for this PR.
I will include the benchmark code in this PR since it has already been created.

Amazing. Thanks for measuring.

cpp/src/lists/interleave_columns.cu

davidwendt · 2024-05-03T18:53:16Z

/merge

Adds a gtest for `cudf::interleave_columns` that tests it can produce large-strings appropriately. Follow on to #15544 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vyas Ramasubramani (https://github.com/vyasr) - MithunR (https://github.com/mythrocks) URL: #15669

davidwendt added 2 commits April 16, 2024 17:02

Large strings support for cudf::interleave_columns

3ba24d6

Merge branch 'branch-24.06' into ls-interleave

f913bb9

davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 16, 2024

davidwendt self-assigned this Apr 16, 2024

davidwendt added 6 commits April 16, 2024 17:12

Merge branch 'branch-24.06' into ls-interleave

83b854b

Merge branch 'branch-24.06' into ls-interleave

cafe925

Merge branch 'branch-24.06' into ls-interleave

d225883

Merge branch 'branch-24.06' into ls-interleave

f0dbe2c

Merge branch 'branch-24.06' into ls-interleave

064d317

add empty string comment

eab5142

davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 23, 2024

davidwendt marked this pull request as ready for review April 23, 2024 19:35

davidwendt requested a review from a team as a code owner April 23, 2024 19:35

davidwendt requested review from robertmaynard and nvdbaranec April 23, 2024 19:35

davidwendt commented Apr 23, 2024

View reviewed changes

davidwendt added 2 commits April 23, 2024 16:00

Merge branch 'branch-24.06' into ls-interleave

a75b86f

Merge branch 'branch-24.06' into ls-interleave

2620f67

vuule reviewed Apr 29, 2024

View reviewed changes

Merge branch 'branch-24.06' into ls-interleave

87485db

bdice approved these changes Apr 30, 2024

View reviewed changes

vuule approved these changes Apr 30, 2024

View reviewed changes

davidwendt added 3 commits April 30, 2024 15:41

Merge branch 'branch-24.06' into ls-interleave

d375fc0

Merge branch 'branch-24.06' into ls-interleave

ddf80e0

add benchmark

e250ba5

github-actions bot added the CMake CMake build issue label May 2, 2024

Merge branch 'branch-24.06' into ls-interleave

a5ed384

davidwendt added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 3, 2024

fix benchmark to include num_cols in limit check

5a43775

davidwendt removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 3, 2024

rapids-bot bot merged commit 09f8ff3 into rapidsai:branch-24.06 May 3, 2024
70 checks passed

davidwendt deleted the ls-interleave branch May 3, 2024 18:53

davidwendt mentioned this pull request May 6, 2024

Add large-strings gtest for cudf::interleave_columns #15669

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large strings support for cudf::interleave_columns #15544

Large strings support for cudf::interleave_columns #15544

davidwendt commented Apr 16, 2024

davidwendt Apr 23, 2024

vuule Apr 29, 2024

davidwendt Apr 30, 2024

vuule Apr 30, 2024

davidwendt Apr 30, 2024

bdice May 1, 2024 •

edited

Loading

davidwendt May 2, 2024 •

edited

Loading

bdice May 2, 2024

davidwendt commented May 3, 2024

		auto const source_col_idx = idx % num_columns;
		auto const source_row_idx = idx / num_columns;

Large strings support for cudf::interleave_columns #15544

Large strings support for cudf::interleave_columns #15544

Conversation

davidwendt commented Apr 16, 2024

Description

Checklist

davidwendt Apr 23, 2024

Choose a reason for hiding this comment

vuule Apr 29, 2024

Choose a reason for hiding this comment

davidwendt Apr 30, 2024

Choose a reason for hiding this comment

vuule Apr 30, 2024

Choose a reason for hiding this comment

davidwendt Apr 30, 2024

Choose a reason for hiding this comment

bdice May 1, 2024 • edited Loading

Choose a reason for hiding this comment

davidwendt May 2, 2024 • edited Loading

Choose a reason for hiding this comment

bdice May 2, 2024

Choose a reason for hiding this comment

davidwendt commented May 3, 2024

bdice May 1, 2024 •

edited

Loading

davidwendt May 2, 2024 •

edited

Loading