Beginning of "tinystr" optimization #8

raphlinus · 2019-03-18T22:56:10Z

This patch implements a "tiny string" datatype, very efficient but limited to bounded lengths, and starts with the integration into the Locale datatype. As of this PR, it's just the language subtag, but I wanted to upload it to get feedback before proceeding further.

Progress towards #7

This commit adds a compact string representation, but doesn't wire them up. Part of the plan for projectfluent#7

Changes the Locale and parser to use TinyStr for language.

raphlinus · 2019-03-18T23:02:09Z

I'm not asking for this to be merged just yet. There are some unused code warnings that will go away.

A few questions. I see that the full generality of bcp-47 is not supported, for example language can only be 2-3 alpha characters, so "i-enochian" would be failed, similarly for registered language subtag. Supporting these uncommon values is the reason it's TinyStr8, but I can change this.

This implementation uses "fake SIMD". I was able to get that to work well, but @SimonSapin has had success getting LLVM to auto-vectorize in rust-lang/rust#59283 . That code would probably be clearer, and a hair more efficient, but I wasn't able to get auto-vectorization to work for bytewise operations in a u64.

If this approach generally looks good, I'll keep going.

zbraniecki · 2019-03-18T23:14:26Z

I see that the full generality of bcp-47 is not supported, for example language can only be 2-3 alpha characters, so "i-enochian" would be failed, similarly for registered language subtag.

Yes, we're aiming for Unicode Locale Identifier, not Language Tag.

Please, consult http://unicode.org/reports/tr35/#BCP_47_Conformance for differences.

raphlinus · 2019-03-18T23:15:35Z

Ah, that's super helpful, thanks. I was using bcp-47 as my normative spec. I'll adjust.

raphlinus · 2019-03-19T01:25:27Z

As of this commit, this PR moves bench_locale from to 3214ns to 1096ns. Of the remaining time, roughly 40% is in the split on ('-', '_'), and roughly 15% is .is_ascii(), both of which are in stdlib. It should definitely be possible improve these as well, but my main focus is making the representation efficient, so it can be cloned and compared quickly; the representation stuff has gone from the lion's share to a few percent, so I'm pretty happy with how this is going.

zbraniecki · 2019-07-30T23:56:57Z

@raphlinus - can we move this to unic-langid crate?

raphlinus · 2019-07-30T23:58:10Z

Sure, I'm not attached. Feel free to take my draft whichever direction you like, and if you need my attention on something, just let me know exactly what you'd like.

zbraniecki · 2019-08-07T18:36:48Z

closing in favor of zbraniecki/unic-locale#7

raphlinus added 2 commits March 18, 2019 14:30

Add TinyStr types

acfe366

This commit adds a compact string representation, but doesn't wire them up. Part of the plan for projectfluent#7

Start integrating TinyStr types

7e0528e

Changes the Locale and parser to use TinyStr for language.

Extend TinyStr usage to script and region

19afeb0

emilio mentioned this pull request Aug 7, 2019

Ideas for better performance zbraniecki/unic-locale#2

Closed

zbraniecki closed this Aug 7, 2019

zbraniecki mentioned this pull request Jun 3, 2020

Why is TinyStr ASCII-only? zbraniecki/tinystr#16

Closed

raphlinus mentioned this pull request Jul 4, 2020

Does CairoRenderContext::make_image() work on big endian systems? linebender/piet#224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beginning of "tinystr" optimization #8

Beginning of "tinystr" optimization #8

raphlinus commented Mar 18, 2019 •

edited

Loading

raphlinus commented Mar 18, 2019

zbraniecki commented Mar 18, 2019

raphlinus commented Mar 18, 2019

raphlinus commented Mar 19, 2019

zbraniecki commented Jul 30, 2019

raphlinus commented Jul 30, 2019

zbraniecki commented Aug 7, 2019

Beginning of "tinystr" optimization #8

Beginning of "tinystr" optimization #8

Conversation

raphlinus commented Mar 18, 2019 • edited Loading

raphlinus commented Mar 18, 2019

zbraniecki commented Mar 18, 2019

raphlinus commented Mar 18, 2019

raphlinus commented Mar 19, 2019

zbraniecki commented Jul 30, 2019

raphlinus commented Jul 30, 2019

zbraniecki commented Aug 7, 2019

raphlinus commented Mar 18, 2019 •

edited

Loading