Skip to content

Commit

Permalink
Use N-bit lengths for VZV IndexN format types (#5594)
Browse files Browse the repository at this point in the history
It is impossible for an IndexN array to need more than a length integer
of size N, anyway, the max index is always `>=` the length.


Part of #5523

Builds on #5593


We could in theory just have a `VZVFormatCombo<Index, Len>` type that
allows free selection, however I'm trying to keep this minimal. Overall
the main use case for that is picking things like "a small array of
;argely-sized elements" and we could just expose Index16Len8 for that. I
can see that being useful for things like
#5580, though it also feels
like a data microoptimization.


The "total" lines in fingerprints.csv are interspersed in giant diffs,
and this basically only gets a max of 2-6 byte wins per data, but the
overall data size went down by 200KB. Not amazing, not terrible.

```rust
[18:26:22] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ 
$ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 
23501369
[18:26:08] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ 
$ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 
23391499
```
  • Loading branch information
Manishearth authored Sep 25, 2024
1 parent aa43529 commit 7634c7e
Show file tree
Hide file tree
Showing 184 changed files with 77,345 additions and 77,306 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
- `zerotrie`
- `zerovec`
- This release has multiple changes that affect the bit representation of various types. Do not update to this release if you wish to retain stable data formats.
- Change the `VarZeroVecFormat` values shipped by default to use the same index and length width. This breaks data layout for all `VarZeroVec`s. (https://github.com/unicode-org/icu4x/pull/5594)
- Optimize `MultiFieldsULE` to not store a length anymore. This breaks data layout for any `#[make_varule]`-using struct with multiple variable-sized fields. (https://github.com/unicode-org/icu4x/pull/5593)
- `writeable`

Expand Down

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions components/plurals/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1073,10 +1073,10 @@ fn test_serde_nonsingleton_roundtrip() {
assert_eq!(
postcard_bytes,
&[
16, // Postcard header
14, // Postcard header
0x80, // Discriminant
3, b'a', b'b', b'c', // String of length 3
1, 0, 0, 0, 0, 0, // VarZeroVec of length 1
1, 0, 0, 0, // VarZeroVec of length 1
0x10, b'd', b'e', b'f', b'g' // Plural category 1 and string "defg"
]
);
Expand Down
2 changes: 1 addition & 1 deletion components/plurals/src/rules/runtime/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -573,7 +573,7 @@ mod test {
let vzv = VarZeroVec::<_>::from(relations.as_slice());
assert_eq!(
vzv.as_bytes(),
&[1, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0]
&[1, 0, 0, 0, 192, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0]
);
}
}
2 changes: 1 addition & 1 deletion provider/blob/benches/auxkey_bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ fn make_blob_v2() -> Vec<u8> {
put_payloads::<MarkerD>(&mut exporter);
exporter.close().unwrap();
drop(exporter);
assert_eq!(blob.len(), 32982);
assert_eq!(blob.len(), 32980);
assert!(blob.len() > 100);
blob
}
Expand Down
Binary file modified provider/blob/tests/data/v2.postcard
Binary file not shown.
4 changes: 2 additions & 2 deletions provider/data/casemap/data/case_map_unfold_v1_marker.rs.data

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions provider/data/casemap/data/case_map_v1_marker.rs.data

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions provider/data/casemap/fingerprints.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
props/casemap@1, <singleton>, 22515B, 22428B, ff76a17649a34363
props/casemap_unfold@1, <singleton>, 976B, 932B, 85d63de2fdea5a3d
props/casemap@1, <singleton>, 22513B, 22426B, a4a125633510e06d
props/casemap_unfold@1, <singleton>, 972B, 928B, b861f4456b3907f

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions provider/data/casemap/stubdata/case_map_v1_marker.rs.data

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading

0 comments on commit 7634c7e

Please sign in to comment.