-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large strings support for cudf::interleave_columns #15544
Conversation
{ | ||
CUDF_FAIL("Called `interleave_list_entries_fn()` on non-supported types."); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just moved from below.
auto const source_col_idx = idx % num_columns; | ||
auto const source_row_idx = idx / num_columns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This access pattern makes me wonder if a kernel would be significantlty faster.
But I assume this is light-weigth either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean faster than thrust::transform
?
The lambda here should be very fast since it only operates on the bitmask and the offsets in a very coalesced access pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I thought about this is because threads with adjacent indices access different columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. That is a good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost made the same comment as @vuule but then I wondered if the point was that interleaving would have coalesced writes (not reads)? I didn’t look too closely at whether that was true but my intuition was that swapping these might be worthwhile. At least worth benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched the order from coalesced write to coalesced read and wrote a benchmark with different number of columns. The performance did suffer 10% (for 2 columns) to 35% (for 100 columns).
This probably could be mitigated with some extra work to use shared-memory to minimize the non-coalesced writes.
But I think this kind of effort should also encompass the non-strings code paths as well (which also do coalesced writes). So I feel this may be a bit out of scope for this PR.
I will include the benchmark code in this PR since it has already been created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing. Thanks for measuring.
/merge |
Adds a gtest for `cudf::interleave_columns` that tests it can produce large-strings appropriately. Follow on to #15544 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vyas Ramasubramani (https://github.com/vyasr) - MithunR (https://github.com/mythrocks) URL: #15669
Description
Updates the
cudf::interleave_columns
logic to use gather-basedmake_strings_column
instead of themake_strings_children
since the gather-based function already efficiently supports longs.Checklist