-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to harfbuzz funcs #4794
Switch to harfbuzz funcs #4794
Conversation
Decision on path forward: Add 7 types here: 6 individually-loaded types, and one ZST that implements everything and uses compiled data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Please make separate types as discussed)
ICU4X-WG discussion:
|
Updated |
6741876
to
b6ba1a5
Compare
I don't understand how CI succeeded previously |
Ah it no longer builds native harfbuzz from source. Annoying. |
How should we proceed for CIing this? We can install libharfbuzz-dev for CI but on other platforms that won't work. We can skip this crate for testing, perhaos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify for anyone unsure about the impact of this change: we are replacing ICU4X code like
unsafe {
hb_unicode_funcs_set_mirroring_func(
ufuncs.raw,
Some({
extern "C" fn cb(
_: *mut hb_unicode_funcs_t,
unicode: hb_codepoint_t,
_: *mut c_void,
) -> hb_codepoint_t {
bidi_data::bidi_auxiliary_properties()
.get32_mirroring_props(unicode)
.mirroring_glyph
.map(u32::from)
.unwrap_or(unicode) as hb_codepoint_t
}
cb
}),
core::ptr::null_mut(),
None,
);
}
with
builder.set_mirroring_func(Box::new(UnicodeFuncs));
where the impl of that function is
pub fn set_mirroring_func<F: MirroringFunc>(&mut self, f: Box<F>) {
let mirroring_ptr: *mut F = Box::into_raw(f);
extern "C" fn impl_mirroring<F: MirroringFunc>(
_ufuncs: *mut hb_unicode_funcs_t,
unicode: hb_codepoint_t,
user_data: *mut c_void,
) -> hb_codepoint_t {
unsafe { &*(user_data as *mut F) }.mirroring(hb_codepoint_t_to_char(unicode))
as hb_codepoint_t
}
extern "C" fn destroy_mirroring<F>(user_data: *mut c_void) {
let _ = unsafe { Box::from_raw(user_data as *mut F) };
}
unsafe {
hb_unicode_funcs_set_mirroring_func(
self.raw,
Some(impl_mirroring::<F>),
mirroring_ptr as *mut c_void,
Some(destroy_mirroring::<F>),
);
}
}
hb_codepoint_t_to_char
is a private function defined in the Rust harfbuzz
crate:
#[inline]
fn hb_codepoint_t_to_char(input: hb_codepoint_t) -> char {
unsafe { char::from_u32_unchecked(input) }
}
and the impl of mirroring
is
fn mirroring(&self, ch: char) -> char {
bidi_data::bidi_auxiliary_properties()
.get32_mirroring_props(ch.into())
.mirroring_glyph
.unwrap_or(ch)
}
So, after inlining, the "hot path", which is the callback function, changes from
extern "C" fn cb(
_: *mut hb_unicode_funcs_t,
unicode: hb_codepoint_t,
_: *mut c_void,
) -> hb_codepoint_t {
bidi_data::bidi_auxiliary_properties()
.get32_mirroring_props(unicode)
.mirroring_glyph
.map(u32::from)
.unwrap_or(unicode) as hb_codepoint_t
}
to
extern "C" fn impl_mirroring<F: MirroringFunc>(
_ufuncs: *mut hb_unicode_funcs_t,
unicode: hb_codepoint_t,
user_data: *mut c_void,
) -> hb_codepoint_t {
let _self = unsafe { &*(user_data as *mut F) }; // should go away
let ch = unsafe { char::from_u32_unchecked(unicode) };
bidi_data::bidi_auxiliary_properties()
.get32_mirroring_props(ch.into())
.mirroring_glyph
.unwrap_or(ch)
as hb_codepoint_t
}
#[doc = icu_provider::gen_any_buffer_unstable_docs!(ANY, Self::try_new_unstable)] | ||
pub fn try_new_with_any_provider( | ||
provider: &(impl icu_provider::AnyProvider + ?Sized), | ||
) -> Result<Self, HarfBuzzError> { | ||
Self::try_new_unstable(&provider.as_downcasting()) | ||
} | ||
#[cfg(feature = "serde")] | ||
#[doc = icu_provider::gen_any_buffer_unstable_docs!(BUFFER,Self::try_new_unstable)] | ||
pub fn try_new_with_buffer_provider( | ||
provider: &(impl icu_provider::BufferProvider + ?Sized), | ||
) -> Result<Self, HarfBuzzError> { | ||
Self::try_new_unstable(&provider.as_deserializing()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Use the helper macro to generate these. Is there a reason you're not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no baked constructor so it won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro can be made to not generate baked constructors, we do this when we manually write an infallible baked constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out a way to make it work. You're welcome to try.
None, | ||
); | ||
fn convert_gc(gc: GeneralCategory) -> harfbuzz_traits::GeneralCategory { | ||
match gc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: ICU4C and ICU4X agree on the numbering, but Harfbuzz uses different numbering.
https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/uchar.h#L820
https://github.com/servo/rust-harfbuzz/blob/main/harfbuzz-traits/src/unicode_funcs_traits.rs#L13
https://github.com/unicode-org/icu4x/blob/main/components/properties/src/props.rs#L831
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this match
result in as efficient and in as compact code as the array? Even if the compiler generates a table, is it smart enough to only spend a byte per item in the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It usually generates a jump table, which does typically work that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strikes me as a premature optimization but you're welcome to make it as a followup.
Can just the docstest job install the required dependencies? Getting harfbuzz-sys out of the main icu4x tree was a goal of this change and I think it should stay that way. |
Co-authored-by: Shane F. Carr <shane@unicode.org>
@sffc if it's a dependency of the doctest it will be needed in tree. A lot of tasks run tests and pull in dev deps. |
Instead of a doctest we could have a tutorial that doesn't live in the workspace, and is only run in one of the ffi tasks |
Works for me. Something running out of the workspace sounds good. Just make a small harfbuzz tutorial crate. |
Which CI job should this be a part of? |
I think we should upgrade the |
f595c7a
to
da052a2
Compare
We kept making combined FFI tasks and then splitting them because they were getting too slow. That said, I'm fine if you do this. |
What do you think about doing it in the very same job as test-tutorials-local, including the target directory? Maybe even moving the tutorial into the |
Yeah! |
#[doc = icu_provider::gen_any_buffer_unstable_docs!(ANY, Self::try_new_unstable)] | ||
pub fn try_new_with_any_provider( | ||
provider: &(impl icu_provider::AnyProvider + ?Sized), | ||
) -> Result<Self, HarfBuzzError> { | ||
Self::try_new_unstable(&provider.as_downcasting()) | ||
} | ||
#[cfg(feature = "serde")] | ||
#[doc = icu_provider::gen_any_buffer_unstable_docs!(BUFFER,Self::try_new_unstable)] | ||
pub fn try_new_with_buffer_provider( | ||
provider: &(impl icu_provider::BufferProvider + ?Sized), | ||
) -> Result<Self, HarfBuzzError> { | ||
Self::try_new_unstable(&provider.as_deserializing()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro can be made to not generate baked constructors, we do this when we manually write an infallible baked constructor.
8ca5f2a
to
d07d553
Compare
tutorials/rust/harfbuzz/src/main.rs
Outdated
let mut b = Buffer::with("مساء الخير"); | ||
let mut builder = UnicodeFuncsBuilder::new_with_empty_parent().unwrap(); | ||
// Note: AllUnicodeFuncs is zero-sized, so these boxes don't allocate memory. | ||
builder.set_general_category_func(AllUnicodeFuncs::boxed()); | ||
builder.set_combining_class_func(AllUnicodeFuncs::boxed()); | ||
builder.set_mirroring_func(AllUnicodeFuncs::boxed()); | ||
builder.set_script_func(AllUnicodeFuncs::boxed()); | ||
builder.set_compose_func(AllUnicodeFuncs::boxed()); | ||
builder.set_decompose_func(AllUnicodeFuncs::boxed()); | ||
b.set_unicode_funcs(&builder.build()); | ||
b.guess_segment_properties(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sffc can we still make changes to the harfbuzz API? It would be nice if the builder setters were chainable:
let funcs = UnicodeFuncsBuilder::new_with_empty_parent()
.unwrap()
.set_general_category_func(AllUnicodeFuncs::boxed())
.set_combining_class_func(AllUnicodeFuncs::boxed())
.set_mirroring_func(AllUnicodeFuncs::boxed())
.set_script_func(AllUnicodeFuncs::boxed())
.set_compose_func(AllUnicodeFuncs::boxed())
.set_decompose_func(AllUnicodeFuncs::boxed())
.build();
let mut b = Buffer::with("مساء الخير");
b.set_unicode_funcs(&funcs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API was designed to closely mirror the C FFI. The "builder" is actually a real Harfbuzz C object. Every one of these setters calls FFI: it's not just a figment of Rust. The C FFI takes a mutable pointer and sets stuff on the mutable pointer, so the Rust API should by default do the same. The reason I called it a "builder", despite that term not appearing in the Harfbuzz docs, is because the C FFI has a function hb_unicode_funcs_make_immutable
(which seems to be similar to the "freeze" concept in ICU4C), and this was the easiest way to track the state of whether the object was mutable or immutable.
https://harfbuzz.github.io/harfbuzz-hb-unicode.html
Now of course there is room to debate the API design in Rust; I took some liberty by calling the thing a "builder". However, I haven't seen evidence that my design has fatal flaws, and in fact this PR validates my approach. Changing between a mutating builder and a moving builder is a matter of debate that doesn't change what happens under the hood. And, the harfbuzz
crate is external to ICU4X, living on its own lifecycle, and the maintainers of that crate already reviewed and had me change some aspects of the API before it landed. It is a crate designed for a narrow set of power users who know what they are doing. Most folks wanting to do text layout in Rust should hopefully be using a more end-to-end package instead of directly interfacing with harfbuzz.
For all these reasons, I would not personally be enthusiastic about a change of this nature.
If I were to make any change, it would be to rename UnicodeFuncsBuilder
to MutableUnicodeFuncs
and renaming .build()
to .make_immutable()
and having it return a ImmutableUnicodeFuncs
. This would be the thing that most closely mirrors FFI.
d07d553
to
db98dd6
Compare
db98dd6
to
2264879
Compare
Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a few open comments
This isn't quite correct: the design of servo/rust-harfbuzz#197 requires you to use UnicodeFuncsBuilder and pass in six different objects (and it would be inefficient to clone this object five times).
I guess I should have six loadable objects in this crate that people have to manually set? Potentially with a feature-gated helper that automatically builds a
UnicodeFuncs
.Alternatively we can update upstream to also allow for a single UnicodeFuncs object.
Thoughts? @sffc
Fixes #3257