-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UnicodeFuncs with safe-to-implement traits #197
Conversation
harfbuzz/src/unicode_funcs.rs
Outdated
pub trait GeneralCategoryFunc { | ||
/// Given a code point, return the general category as a | ||
/// [`hb_unicode_general_category_t`]. | ||
fn general_category(&self, ch: u32) -> hb_unicode_general_category_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: Perhaps these return values should not use harfbuzz-sys
types such that these traits can be implemented without a harfbuzz-sys
dependency (see unicode-org/icu4x#3175)
👋 Can I get a signal on whether this is the right direction and the shape of what you would approve? |
cc @jdm I'm happy to review this but there's a bit of a conflict of interest, would rather get a 👍 from other Servo people on whether this is a direction we want to go in. Perhaps we should mention this at the next TSC? (ICU4X may also be happy to take on some of the burden of maintaining this crate, the way we are doing with unicode-bidi) |
This seems fine to me, though I wonder if it is ever the case that someone would implement only a subset of the functions? It would be simpler to only expose a single trait for implementing functions if it is always the case that you implement all of them. |
One thing it helps is that it allows people to separate different data sources into different crates and implementors, even allowing for mixing. Unclear if that level of granularity is desired, but it's a potential source of flexibility. (@sffc, thoughts?) |
There could be use cases. For example, maybe you want to overwrite your Script data but leave the Normalizer alone. Harfbuzz seems to make no requirement that all six functions be implemented. This is ready for re-review. |
@sffc If you rebase forward, the underlying macOS build issue in CI is fixed. |
I have a couple of concerns: Concern 1: Does this add function dereferences for each call compared to the case without this abstraction? I think it does when comparing against a case where a callback without this abstraction wouldn't use the user data pointer, but I think it does not when comparing against a case where the callback without this abstraction would use the user data pointer. Is that correct? (I think it's OK if this abstraction doesn't add pointer chasing compared to the case where the user data pointer is used anyhow.) Concern 2: It's a bit odd for the traits to be simultaneously high-level enough to be implementable without Unicode Scalar Values are represented as Canonical Combining Class has numeric values defined by Unicode and the Unicode stability policy guarantees their possible range to fit in the The Unicode stability policy guarantees that the number of items for General Category won't change and HarfBuzz has assigned all the numeric values to fit in Since the HarfBuzz numeric assignments cannot change, I suggest creating an enum in this crate alongside the traits with such that the enum representation is |
Everything is super generic so I hope the compiler will inline these abstractions. For example, in this code: let compose_ptr: *mut F = Box::into_raw(f);
extern "C" fn impl_compose<F: ComposeFunc>(
// ...
user_data: *mut c_void,
) -> hb_bool_t {
let result = unsafe { &*(user_data as *mut F) }
.compose(hb_codepoint_t_to_char(a), hb_codepoint_t_to_char(b));
// ...
} the compiler should be able to figure out the address of
Changed from
Changed from
Changed from |
@waywardmonkeys can you kick CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looking very good, though I have one nit about using Rust-idiomatic naming for GeneralCategory
items.
harfbuzz/src/traits.rs
Outdated
/// [`hb_unicode_general_category_t`]: crate::sys::hb_unicode_general_category_t | ||
#[repr(u8)] | ||
#[derive(Debug, Copy, Clone, PartialEq, Eq)] | ||
#[allow(non_camel_case_types)] // the names are defined by Unicode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest dropping the underscores like ICU4X does. That way, the names would be idiomatic for a Rust enum. Even if the names are defined by Unicode, it's pretty obvious to the reader what the applied name transformation would be if the names were idiomatic to Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, note the casing of the letter "u" in PrivateUse
/Private_Use
.
I can't ... I didn't get commit privs back after losing them. |
I don't understand why this Similarly, I'm not sure what the use case is for |
The need for But I still think that perhaps having a |
Okay, how about naming it |
That sounds good. And now that I understand better, that is a good place perhaps for the draw traits as well when we support that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-raising the previous nits. I trust the substance of the rest of the code hasn't changed--at least that's how it appeared from a cursory look.
Control = 0, | ||
Format = 1, | ||
Unassigned = 2, | ||
Private_use = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
should be Use
with upper-case U
.
I still think the underscores in these values should be removed the way ICU4X removes the upstream Unicode underscores to get idiomatic Rust enum values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with both of those requests, fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that harfbuzz
should re-export the traits crate, like it does for harfbuzz-sys
, so that people who do depend on harfbuzz
can get access to the same version that their harfbuzz
dependency is using.
I might be wrong, so say so if I am!
I think this has resolved all of my comments and concerns! If no one has other issues, I'll merge in about 12 hours. |
This PR makes it easier for clients to implement and use
hb_unicode_funcs_t
.We want to implement these in ICU4X: unicode-org/icu4x#2514
CC @hsivonen @echeran