-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39565: [C++] Do not concatenate ChunkedArray when running take function #39566
base: main
Are you sure you want to change the base?
Conversation
|
cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc
Outdated
Show resolved
Hide resolved
@felipecrv @amol- Should this PR be kept open now that #40206 was merged? |
I think so, this PR is focused on optimizing TakeCA, while the one that was merged was focused on TakeCC |
Before my PR: Next step (and goal of amol's PR/issue pair): 0 concatenations. |
I opened #41700 which can handle all the fixed-width types without concatenation. |
Rationale for this change
We can avoid extra unecessary work and memory consumption of concatenating chunks when running take, we can directly run the take on the chunks at the only cost of remapping the indices which are usually much fewer than the size of the array we are applying take on.
Are these changes tested?
Two tests already existed that verify take on ChunkedArray and they covered the corner cases well, the only tweak necessary to those tests was that now take returns a chunkedarray made of multiple chunks instead of a single one.