Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use N-bit lengths for VZV IndexN format types #5594

Merged
merged 7 commits into from
Sep 25, 2024

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Sep 25, 2024

It is impossible for an IndexN array to need more than a length integer of size N, anyway, the max index is always >= the length.

Part of #5523

Builds on #5593

We could in theory just have a VZVFormatCombo<Index, Len> type that allows free selection, however I'm trying to keep this minimal. Overall the main use case for that is picking things like "a small array of ;argely-sized elements" and we could just expose Index16Len8 for that. I can see that being useful for things like #5580, though it also feels like a data microoptimization.

The "total" lines in fingerprints.csv are interspersed in giant diffs, and this basically only gets a max of 2-6 byte wins per data, but the overall data size went down by 200KB. Not amazing, not terrible.

[18:26:22] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ 
$ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 
23501369
[18:26:08] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ 
$ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 
23391499

@sffc
Copy link
Member

sffc commented Sep 25, 2024

It's not super common that Index8 is big enough, but it is quite common that Len8 is big enough. I feel like Index8Len16 would be one of the most commonly used settings.

@Manishearth
Copy link
Member Author

@sffc we can expose it easily in a followup, yeah. I think Index16Len8 is definitely more useful than Index8 and we should consider looking at where we use Index8 now.

/// The error to show when unable to construct a vec
#[doc(hidden)]
const TOO_LARGE_ERROR: &'static str;

/// Safety: must be sizeof(self)
Copy link
Member

@sffc sffc Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Why not just use size_of::<Self>() which is const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it makes the code easier to read but I'm open to that being a followup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically mem::size_of::<F::Index> feels clunkier than F::Index::SIZE in the situations it gets used in.

however in my next PR I'm removing METADATA_WIDTH and that may make the expressions simple enough that a sizeof won't be a big deal

@Manishearth Manishearth merged commit 7634c7e into unicode-org:main Sep 25, 2024
28 checks passed
@Manishearth Manishearth deleted the format-len-vzv branch September 25, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants