Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly support sliced unions #91

Merged
merged 2 commits into from
Jan 15, 2023

Conversation

jleibs
Copy link
Contributor

@jleibs jleibs commented Jan 15, 2023

I believe this resolves: #53

Rather than creating and maintaining parallel iterators, this uses the UnionArray::index() method to find the correct offset value to deserialize from and then deserializes from a newly created slice at that location in the correct child-array. One nice aspect to this approach is it generalizes to Sparse unions since index() does the correct book-keeping for us.

@codecov-commenter
Copy link

Codecov Report

Merging #91 (c447607) into main (dfa59fa) will decrease coverage by 3.88%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #91      +/-   ##
==========================================
- Coverage   98.25%   94.36%   -3.89%     
==========================================
  Files           8        8              
  Lines        1490     1544      +54     
==========================================
- Hits         1464     1457       -7     
- Misses         26       87      +61     
Impacted Files Coverage Δ
arrow2_convert_derive/src/derive_enum.rs 98.96% <100.00%> (-0.07%) ⬇️
arrow2_convert_derive/src/derive_struct.rs 89.10% <0.00%> (-10.65%) ⬇️
arrow2_convert_derive/src/input.rs 80.00% <0.00%> (-6.51%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ncpenke
Copy link
Collaborator

ncpenke commented Jan 15, 2023

Thank you for the contribution and simplification @jleibs!

I was considering a similar approach, but wasn't sure what the performance implication of going through the full array deserialization would be for each union element. However, we don't have benchmarks around that yet, and this seems to be a reasonable trade-off for enabling more functionality, so will go ahead and merge this PR.

There's some minor code cleanup but I'll take care of that in another pass.

@ncpenke ncpenke merged commit 8d08bb3 into DataEngineeringLabs:main Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix deserialization for union slices
3 participants