Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving icu_harfbuzz compiled data #4574

Merged
merged 10 commits into from
Feb 27, 2024
Merged

Conversation

robertbastian
Copy link
Member

The current HB compiled data code is not ideal, because it uses static_to_owned, which might be blocking optimisations. This also let me realise that some compiled constructors that could be const aren't.

@sffc
Copy link
Member

sffc commented Feb 1, 2024

Before I review this, were you aware of #3257, servo/rust-harfbuzz#197, and servo/rust-harfbuzz#245 when writing this PR?

@robertbastian
Copy link
Member Author

I found it after. I do think the changes here are independent of that though, we'll need a compiled data and a data provider path if we want maximum performance.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this PR is an improvement.

  1. It creates two versions of the icu4x_hb_unicode_* functions, which seem to be one for static data and one for provider data. We try really hard to avoid cfg(not(...)) code because it's very difficult to ensure test coverage.
  2. The PR seems to be based on the claim that the current code "uses static_to_owned, which might be blocking optimisations". I'm fairly concerned about this statement because static_to_owned is clearly documented as being "cheap", and it is a const function. If there is a problem with that function, should we go back to the drawing board? It is designed, I believe, specifically for cases like this where we want to coalesce static and dynamic data down to a single code path.

@robertbastian
Copy link
Member Author

The PR seems to be based on the claim that the current code "uses static_to_owned, which might be blocking optimisations". I'm fairly concerned about this statement because static_to_owned is clearly documented as being "cheap", and it is a const function. If there is a problem with that function, should we go back to the drawing board? It is designed, I believe, specifically for cases like this where we want to coalesce static and dynamic data down to a single code path.

Coalescing static data into the dynamic codepath inherently blocks optimisations. As soon as a DataPayload gets involved, all the non-static code gets pulled in, and the enum switch inhibits any possible optimisations against the static data. static_to_owned itself is cheap, because creating a static DataPayload is cheap, but we never claimed that it's as efficient as using the fully static codepath.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't really like the cfg(not(...)) code. Can we instead export two different C functions with different names, one which reads from user_data and one which reads from compiled data? And then export two versions of Rust pub fn new_hb_unicode_funcs, one which uses compiled data and the other which uses dynamic data. The compiled data versions are behind the compiled_data feature; it's just that the dynamic data versions are still available even if the compiled_data feature is enabled.

@robertbastian robertbastian marked this pull request as draft February 19, 2024 13:27
@robertbastian robertbastian marked this pull request as ready for review February 27, 2024 13:18
sffc
sffc previously approved these changes Feb 27, 2024
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code duplication is annoying but hopefully that will go away with #3257

However, note that servo/rust-harfbuzz#197 always uses user_data. It takes the void* and casts it to the generic type parameter and calls the trait function with that pointer as the self parameter. So, migrating this code over to the new harfbuzz traits will be slightly different. I guess we would just ignore the self parameter in the compiled_data code path.

I assume you did not touch any of the unsafe FFI logic.

components/normalizer/src/properties.rs Show resolved Hide resolved
hb_unicode_funcs_set_combining_class_func(
ufuncs.raw,
Some({
extern "C" fn cb(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, with the Rust fns all named cb? I guess the names get mangled? Nice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@robertbastian robertbastian merged commit f411b72 into unicode-org:main Feb 27, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants