-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PackedPatternsV1 with packing and unpacking #5580
Merged
Merged
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
8fe27b0
Draft up docs and structure for PackedSkeletonDataV2
sffc 1a8b794
typo
sffc 7780ad4
Add note about small numbers
sffc c4ecf50
fmt
sffc b2ea3e6
Add Hash impls (this commit can probably be reverted)
sffc 9c3b12c
Add impls to plurals crate
sffc 2f666d4
impl Hash for ZeroVec
sffc 035f69f
Checkpoint
sffc d507474
Revert "Add Hash impls (this commit can probably be reverted)"
sffc cc95d08
Revert "impl Hash for ZeroVec"
sffc 5f575e7
No hash impls
sffc c01e882
Compiling builder code
sffc 7a0d922
Another impl in plurals crate
sffc 28277ec
fmt
sffc da2bf9c
Initial tests
sffc df65f3e
Docs warnings
sffc 053351e
fmt
sffc bff7844
diplomat-coverage
sffc 20ac388
Make it infallible. Resolves Clippy
sffc 49ce950
Cleanup, tests
sffc cf6d717
Rename to PatternsPackedV1
sffc 8526094
PackedPatterns
sffc 5cbe18f
Add comment about VZV length
sffc 6b30bd8
Pull some constants into constants module; document LMS better
sffc 4e94347
Postcard note
sffc c01686e
chunks comments
sffc 564669f
clippy
sffc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
// This file is part of ICU4X. For terms of use, please see the file | ||
// called LICENSE at the top level of the ICU4X source tree | ||
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ). | ||
|
||
//! Data structures for packing of datetime patterns. | ||
|
||
use crate::helpers::size_test; | ||
use icu_plurals::PluralElements; | ||
use zerovec::VarZeroVec; | ||
|
||
use crate::pattern::runtime::{Pattern, PatternULE}; | ||
|
||
#[derive(Debug)] | ||
pub struct LengthPluralElements<'a> { | ||
pub long: PluralElements<&'a Pattern<'a>>, | ||
pub medium: PluralElements<&'a Pattern<'a>>, | ||
pub short: PluralElements<&'a Pattern<'a>>, | ||
} | ||
|
||
/// A builder for a [`PackedSkeletonDataV2`]. | ||
#[derive(Debug)] | ||
pub struct PackedSkeletonDataBuilder<'a> { | ||
pub standard: LengthPluralElements<'a>, | ||
/// Patterns for variant 0. If `None`, falls back to standard. | ||
pub variant0: Option<LengthPluralElements<'a>>, | ||
/// Patterns for variant 1. If `None`, falls back to variant 0. | ||
pub variant1: Option<LengthPluralElements<'a>>, | ||
} | ||
|
||
size_test!(PackedSkeletonDataV2, packed_skeleton_data_size, 32); | ||
|
||
/// Main data struct for packed datetime patterns. | ||
/// | ||
/// ## Variants | ||
/// | ||
/// This supports a set of "standard" patterns plus up to two "variants". | ||
/// The variants are currently used by year formatting: | ||
/// | ||
/// - Standard: Year, which could be partial precision (2-digit Gregorain) | ||
/// - Variant 0: Full Year, which is always full precision | ||
/// - Variant 1: Year With Era | ||
/// | ||
/// Variants should be used when the pattern could depend on the value being | ||
/// formatted. For example, with [`EraDisplay::Auto`], any of these three | ||
/// patterns could be selected based on the year value. | ||
/// | ||
/// ## Representation | ||
/// | ||
/// Currently, there are at most 9 patterns that need to be stored together, | ||
/// named according to this table: | ||
/// | ||
/// | | Standard | Variant 0 | Variant 1 | | ||
/// |--------|----------|-----------|-----------| | ||
/// | Long | La | Lb | Lc | | ||
/// | Medium | Ma | Mb | Mc | | ||
/// | Short | Sa | Sb | Sc | | ||
/// | ||
/// The header byte encodes which pattern in the patterns array corresponds to | ||
/// a particular cell in the table. It contains the following information: | ||
/// | ||
/// - Bits 0-1: "LMS" value of the standard column | ||
/// - Bit 2: "Q" value: 1 for directly-indexed variants; 0 for per-cell offsets | ||
/// - Bits 3-20: Packed offset into patterns table for each variant cell | ||
/// - Bits 21-31: unused/reserved | ||
/// | ||
/// The LMS value determines which pattern index is used for the first column: | ||
/// | ||
/// | LMS Value | Long Index | Medium Index | Short Index | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: let's add a column that contains the entries "LMS", "L, MS", "LM, S", and "L, M, S" since it's easier to read. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in 6b30bd8 |
||
/// |-----------|------------|--------------|-------------| | ||
/// | 0 | 0 | 0 | 0 | | ||
/// | 1 | 0 | 1 | 1 | | ||
/// | 2 | 0 | 0 | 1 | | ||
/// | 3 | 0 | 1 | 2 | | ||
/// | ||
/// If bit 2 is 1 (Q=1), it means there is one pattern per variant index, | ||
/// with the index offset by the short index `S` from the table above. | ||
/// However, this requires storing multiple, possibly duplicate, patterns in | ||
/// the packed structure. The more common case is Q=0 and then to store | ||
/// per-cell offsets in chunks of 3 bits per cell: | ||
/// | ||
/// - Chunk = 0: Inherit according to the table below | ||
/// - Chunk = 1-7: Use pattern index Chunk - 1 | ||
/// | ||
/// This is summarized below: | ||
/// | ||
/// | Cell in Table | Q=1 Pattern Index | Q=0 Header Bits | Inheritance | | ||
/// |---------------|-------------------|-----------------|-------------| | ||
/// | Lb | S + 1 | 3-5 | La | | ||
/// | Mb | S + 2 | 6-8 | Ma | | ||
/// | Sb | S + 3 | 9-11 | Sa | | ||
/// | Lc | S + 4 | 12-14 | Lb | | ||
/// | Mc | S + 5 | 15-17 | Mb | | ||
/// | Sc | S + 6 | 18-20 | Mc | | ||
/// | ||
/// [`EraDisplay::Auto`]: crate::neo_skeleton::EraDisplay::Auto | ||
#[doc = packed_skeleton_data_size!()] | ||
#[derive(Debug, PartialEq, Clone)] | ||
#[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))] | ||
#[cfg_attr(feature = "datagen", databake(path = icu_datetime::provider::neo))] | ||
#[cfg_attr(feature = "serde", derive(serde::Deserialize))] | ||
pub struct PackedSkeletonDataV2<'data> { | ||
/// An encoding of which standard/variant cell corresponds to which entry | ||
/// in the patterns table. See class docs. | ||
pub header: u32, | ||
/// The list of patterns. | ||
#[cfg_attr(feature = "serde", serde(borrow))] | ||
pub patterns: VarZeroVec<'data, PatternULE>, | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Not a fan of calling them variant0 variant1 in code here; can we primarily use descriptive names and in the docs mention that this is also "variant0" and "variant1"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your suggestion to call them
full_year
andwith_era
? Because I wanted these to be generalizable enough to other cases that might want to use this mechanism in the future. For instance, maybe some locale will want to use it for day periods or something ("6 in the evening", "7 in the evening", "8pm", "9pm").