Efficient deserialization for Buffer<u8> #103

jleibs · 2023-02-22T00:53:48Z

This implements ArrowDeserialize for Buffer as part of #101

There are a few hoops to jump through because the BinaryArray<i32> normally exposes an iterator which is &[u8] slices, whereas we want to expose a Buffer primitive directly to take advantage of the Arc behavior of arrow Buffers.

codecov-commenter · 2023-02-22T01:33:29Z

Codecov Report

Merging #103 (14e772c) into main (b2fd8f8) will decrease coverage by 0.05%.
The diff coverage is 91.30%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   94.63%   94.58%   -0.05%     
==========================================
  Files           8        8              
  Lines        1565     1588      +23     
==========================================
+ Hits         1481     1502      +21     
- Misses         84       86       +2

Impacted Files	Coverage Δ
arrow2_convert/src/field.rs	`100.00% <ø> (ø)`
arrow2_convert/src/deserialize.rs	`98.78% <91.30%> (-1.22%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

emilk · 2023-02-22T16:29:24Z

It could be nice to cover Buffer<u8> in arrow2_convert/benches/bench.rs too

jleibs · 2023-02-22T17:17:43Z

It could be nice to cover Buffer<u8> in arrow2_convert/benches/bench.rs too

Interestingly these benches show this work might not have actually been all that necessary. Vec<u8> looks like it has same constant-time behavior as Buffer<u8>. Curious if this is unique to the bench or a more general optimization.

deserialize/BufferU8/10 time:   [61.707 ns 61.950 ns 62.262 ns]
                        thrpt:  [160.61 Melem/s 161.42 Melem/s 162.06 Melem/s]
                        
deserialize/VecU8/10    time:   [70.875 ns 71.151 ns 71.433 ns]
                        thrpt:  [139.99 Melem/s 140.55 Melem/s 141.09 Melem/s]

deserialize/BufferU32/10
                        time:   [162.66 ns 163.88 ns 165.04 ns]
                        thrpt:  [60.593 Melem/s 61.019 Melem/s 61.478 Melem/s]

deserialize/VecU32/10   time:   [169.86 ns 171.55 ns 173.81 ns]
                        thrpt:  [57.535 Melem/s 58.291 Melem/s 58.871 Melem/s]

deserialize/BufferU8/100
                        time:   [61.751 ns 61.865 ns 61.995 ns]
                        thrpt:  [1.6130 Gelem/s 1.6164 Gelem/s 1.6194 Gelem/s]

deserialize/VecU8/100   time:   [71.856 ns 72.057 ns 72.305 ns]
                        thrpt:  [1.3830 Gelem/s 1.3878 Gelem/s 1.3917 Gelem/s]

deserialize/BufferU32/100
                        time:   [160.29 ns 161.14 ns 161.90 ns]
                        thrpt:  [617.67 Melem/s 620.59 Melem/s 623.88 Melem/s]

deserialize/VecU32/100  time:   [214.01 ns 215.51 ns 217.32 ns]
                        thrpt:  [460.16 Melem/s 464.02 Melem/s 467.26 Melem/s]

deserialize/BufferU8/1000
                        time:   [61.111 ns 61.215 ns 61.338 ns]
                        thrpt:  [16.303 Gelem/s 16.336 Gelem/s 16.364 Gelem/s]

deserialize/VecU8/1000  time:   [73.965 ns 74.617 ns 75.273 ns]
                        thrpt:  [13.285 Gelem/s 13.402 Gelem/s 13.520 Gelem/s]

deserialize/BufferU32/1000
                        time:   [159.60 ns 160.39 ns 161.10 ns]
                        thrpt:  [6.2075 Gelem/s 6.2348 Gelem/s 6.2655 Gelem/s]

deserialize/VecU32/1000 time:   [659.44 ns 660.73 ns 662.17 ns]
                        thrpt:  [1.5102 Gelem/s 1.5135 Gelem/s 1.5164 Gelem/s]

deserialize/BufferU8/10000
                        time:   [61.588 ns 61.714 ns 61.856 ns]
                        thrpt:  [161.67 Gelem/s 162.04 Gelem/s 162.37 Gelem/s]

deserialize/VecU8/10000 time:   [70.506 ns 70.661 ns 70.800 ns]
                        thrpt:  [141.24 Gelem/s 141.52 Gelem/s 141.83 Gelem/s]

deserialize/BufferU32/10000
                        time:   [160.37 ns 161.42 ns 162.49 ns]
                        thrpt:  [61.544 Gelem/s 61.950 Gelem/s 62.354 Gelem/s]

deserialize/VecU32/10000
                        time:   [5.5109 µs 5.5266 µs 5.5450 µs]
                        thrpt:  [1.8034 Gelem/s 1.8094 Gelem/s 1.8146 Gelem/s]

ncpenke · 2023-03-18T00:30:40Z

Thanks for the change!

jleibs added 3 commits February 22, 2023 00:15

Specializations for Buffer<u8>

87e2fcb

Avoid copies in the the Buffer<u8> iterator

236e2f2

fmt

ef91b3c

jleibs marked this pull request as draft February 22, 2023 00:59

Proper usage of buffer.slice()

3d6e7c0

jleibs marked this pull request as ready for review February 22, 2023 01:27

jleibs mentioned this pull request Feb 22, 2023

Upgrade arrow2/convert and use native buffers for the tensor u8 types rerun-io/rerun#1375

Merged

4 tasks

Add test for nullable code path

b4299e6

jleibs force-pushed the jleibs/buffer_u8 branch from 1f2a647 to b4299e6 Compare February 22, 2023 15:36

jleibs added 2 commits February 22, 2023 18:09

Add bench for Buffer<u8> optimization

6056ae3

fmt

14e772c

emilk mentioned this pull request Mar 8, 2023

Rerun PRs rerun-io/opensource#2

Open

ncpenke merged commit 6450acb into DataEngineeringLabs:main Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient deserialization for Buffer<u8> #103

Efficient deserialization for Buffer<u8> #103

jleibs commented Feb 22, 2023 •

edited

Loading

codecov-commenter commented Feb 22, 2023 •

edited

Loading

emilk commented Feb 22, 2023

jleibs commented Feb 22, 2023 •

edited

Loading

ncpenke commented Mar 18, 2023

Efficient deserialization for Buffer<u8> #103

Efficient deserialization for Buffer<u8> #103

Conversation

jleibs commented Feb 22, 2023 • edited Loading

codecov-commenter commented Feb 22, 2023 • edited Loading

Codecov Report

emilk commented Feb 22, 2023

jleibs commented Feb 22, 2023 • edited Loading

ncpenke commented Mar 18, 2023

jleibs commented Feb 22, 2023 •

edited

Loading

codecov-commenter commented Feb 22, 2023 •

edited

Loading

jleibs commented Feb 22, 2023 •

edited

Loading