Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datagen: Cache supported_locales impls #4470

Merged
merged 10 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions provider/core/src/datagen/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ pub trait IterableDynamicDataProvider<M: DataMarker>: DynamicDataProvider<M> {
pub trait IterableDataProvider<M: KeyedDataMarker>: DataProvider<M> {
/// Returns a list of [`DataLocale`].
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError>;
/// Returns whether a [`DataLocale`] is in the supported locales list.
fn supports_locale(&self, locale: &DataLocale) -> Result<bool, DataError> {
self.supported_locales().map(|v| v.contains(locale))
}
}

impl<M, P> IterableDynamicDataProvider<M> for Box<P>
Expand Down
56 changes: 56 additions & 0 deletions provider/datagen/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
use crate::source::*;
use crate::transform::cldr::source::CldrCache;
use crate::{CollationHanDatabase, CoverageLevel};
use elsa::sync::FrozenMap;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::fmt::Debug;
use std::path::PathBuf;
Expand Down Expand Up @@ -78,6 +80,7 @@ impl DatagenProvider {
icuexport_dictionary_fallback: None,
#[cfg(feature = "legacy_api")]
collations: Default::default(),
supported_locales_cache_vec: Default::default(),
},
}
}
Expand Down Expand Up @@ -306,6 +309,28 @@ pub struct SourceData {
pub(crate) icuexport_dictionary_fallback: Option<Arc<SerdeCache>>,
#[cfg(feature = "legacy_api")]
pub(crate) collations: Vec<String>,
pub(crate) supported_locales_cache_vec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an Arc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to Arc

FrozenMapThrowawayClone<DataKey, Box<Result<Vec<DataLocale>, DataError>>>,
}

pub(crate) struct FrozenMapThrowawayClone<K, V>(pub(crate) FrozenMap<K, V>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe elsa master has these impls now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does for sync at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsa 1.10 should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgraded to elsa 1.10


impl<K, V> Default for FrozenMapThrowawayClone<K, V> {
fn default() -> Self {
Self(FrozenMap::new())
}
}

impl<K, V> Clone for FrozenMapThrowawayClone<K, V> {
fn clone(&self) -> Self {
Default::default()
}
}

impl<K, V> Debug for FrozenMapThrowawayClone<K, V> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "FrozenMap")
}
}

#[cfg(feature = "legacy_api")]
Expand Down Expand Up @@ -485,3 +510,34 @@ impl SourceData {
.locales(levels.iter().copied())
}
}

pub(crate) trait IterableDataProviderInternal<M: KeyedDataMarker>: DataProvider<M> {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError>;
}

impl<M: KeyedDataMarker> IterableDataProvider<M> for DatagenProvider
where
DatagenProvider: IterableDataProviderInternal<M>,
{
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
#[allow(deprecated)] // SourceData
self.source
.supported_locales_cache_vec
.0
.insert_with(M::KEY, || Box::from(self.supported_locales_impl()))
.as_ref()
.cloned()
.map_err(|e| *e)
}

fn supports_locale(&self, locale: &DataLocale) -> Result<bool, DataError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supports_locale is called n times per key, whereas supported_locales is only called once. It probably makes more sense to cache a HashSet and convert it to a Vec for the one call, than to linearly scan a Vec n times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance on cargo make bakeddata datetime (run time only, not build time):

Commit Trial 1 Trial 2 Trial 3
11a1ed1 (Vec) 53.18 61.74 53.82
f2de286 (HashSet) 58.64 51.89 51.03

So it looks like HashSet wins.

#[allow(deprecated)] // SourceData
self.source
.supported_locales_cache_vec
.0
.insert_with(M::KEY, || Box::from(self.supported_locales_impl()))
.as_ref()
.map_err(|e| *e)
.map(|v| v.contains(locale))
}
}
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/characters/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
use std::collections::HashSet;
use std::marker::PhantomData;

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use icu_collections::codepointinvliststringlist::CodePointInversionListAndStringList;
use icu_properties::provider::*;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use itertools::Itertools;

Expand Down Expand Up @@ -46,8 +46,8 @@ macro_rules! exemplar_chars_impls {
}
}

impl IterableDataProvider<$data_marker_name> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<$data_marker_name> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.misc()
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/currency/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ use tinystr::UnvalidatedTinyAsciiStr;
use zerovec::VarZeroVec;
use zerovec::ZeroMap;

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use crate::DatagenProvider;
use icu_dimension::provider::*;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::collections::BTreeMap;
use std::collections::HashMap;
Expand Down Expand Up @@ -95,8 +95,8 @@ impl DataProvider<CurrencyEssentialsV1Marker> for crate::DatagenProvider {
}
}

impl IterableDataProvider<CurrencyEssentialsV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<CurrencyEssentialsV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.numbers()
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/datetime/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use icu_datetime::provider::calendar::*;
use icu_locid::{
extensions::unicode::{key, value},
Locale,
};
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use once_cell::sync::OnceCell;
use std::collections::HashMap;
Expand Down Expand Up @@ -207,8 +207,8 @@ macro_rules! impl_data_provider {
}
}

impl IterableDataProvider<$marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<$marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
let mut r = Vec::new();
if DateSkeletonPatternsV1Marker::KEY == $marker::KEY {
for (cal_value, cldr_cal) in supported_cals() {
Expand Down
13 changes: 7 additions & 6 deletions provider/datagen/src/transform/cldr/datetime/neo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use super::supported_cals;
use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde::ca;
use crate::DatagenProvider;
use icu_datetime::pattern::{self, CoarseHourCycle};
Expand Down Expand Up @@ -681,8 +682,8 @@ impl DataProvider<TimePatternV1Marker> for DatagenProvider {
// subtag actually should be produced (by returning a special error), then this code is no longer necessary
// and we can use a union of the H12/H24 key lengths arrays, instead checking for preferred hc
// in timepattern_convert
impl IterableDataProvider<TimePatternV1Marker> for DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<TimePatternV1Marker> for DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
let calendar = value!("gregory");
let mut r = Vec::new();

Expand Down Expand Up @@ -727,8 +728,8 @@ macro_rules! impl_symbols_datagen {
}
}

impl IterableDataProvider<$marker> for DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<$marker> for DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
self.supported_locales_neo(value!($calendar), $lengths)
}
}
Expand All @@ -743,8 +744,8 @@ macro_rules! impl_pattern_datagen {
}
}

impl IterableDataProvider<$marker> for DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<$marker> for DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
self.supported_locales_neo(value!($calendar), $lengths)
}
}
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/datetime/week_data.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde::{
self,
week_data::{Territory, DEFAULT_TERRITORY},
};
use icu_calendar::provider::{WeekDataV1, WeekDataV1Marker};
use icu_locid::LanguageIdentifier;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::collections::HashSet;

impl IterableDataProvider<WeekDataV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<WeekDataV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
let week_data: &cldr_serde::week_data::Resource = self
.cldr()?
.core()
Expand Down
12 changes: 7 additions & 5 deletions provider/datagen/src/transform/cldr/decimal/compact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use icu_compactdecimal::provider::*;
use icu_locid::extensions::unicode::key;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::convert::TryFrom;

Expand Down Expand Up @@ -105,14 +105,16 @@ impl DataProvider<LongCompactDecimalFormatDataV1Marker> for crate::DatagenProvid
}
}

impl IterableDataProvider<ShortCompactDecimalFormatDataV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<ShortCompactDecimalFormatDataV1Marker>
for crate::DatagenProvider
{
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
self.supported_locales()
}
}

impl IterableDataProvider<LongCompactDecimalFormatDataV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<LongCompactDecimalFormatDataV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
self.supported_locales()
}
}
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/decimal/symbols.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use icu_decimal::provider::*;
use icu_locid::extensions::unicode::key;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::borrow::Cow;
use std::convert::TryFrom;
Expand Down Expand Up @@ -47,8 +47,8 @@ impl DataProvider<DecimalSymbolsV1Marker> for crate::DatagenProvider {
}
}

impl IterableDataProvider<DecimalSymbolsV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<DecimalSymbolsV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
self.supported_locales()
}
}
Expand Down
10 changes: 5 additions & 5 deletions provider/datagen/src/transform/cldr/displaynames/language.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use core::convert::TryFrom;
use icu_displaynames::provider::*;
use icu_locid::subtags::Language;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::collections::BTreeMap;
use zerovec::ule::UnvalidatedStr;
Expand Down Expand Up @@ -58,8 +58,8 @@ impl DataProvider<LocaleDisplayNamesV1Marker> for crate::DatagenProvider {
}
}

impl IterableDataProvider<LanguageDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<LanguageDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.displaynames()
Expand All @@ -77,8 +77,8 @@ impl IterableDataProvider<LanguageDisplayNamesV1Marker> for crate::DatagenProvid
}
}

impl IterableDataProvider<LocaleDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<LocaleDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.displaynames()
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/displaynames/region.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use core::convert::TryFrom;
use icu_displaynames::provider::*;
use icu_locid::subtags::Region;
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::collections::BTreeMap;
use std::str::FromStr;
Expand Down Expand Up @@ -35,8 +35,8 @@ impl DataProvider<RegionDisplayNamesV1Marker> for crate::DatagenProvider {
}
}

impl IterableDataProvider<RegionDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<RegionDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.displaynames()
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/displaynames/script.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use core::convert::TryFrom;
use icu_displaynames::provider::*;
use icu_locid::{subtags::Script, ParserError};
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::collections::BTreeMap;
use std::str::FromStr;
Expand Down Expand Up @@ -35,8 +35,8 @@ impl DataProvider<ScriptDisplayNamesV1Marker> for crate::DatagenProvider {
}
}

impl IterableDataProvider<ScriptDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<ScriptDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.displaynames()
Expand Down
6 changes: 3 additions & 3 deletions provider/datagen/src/transform/cldr/displaynames/variant.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::provider::IterableDataProviderInternal;
use crate::transform::cldr::cldr_serde;
use core::convert::TryFrom;
use icu_displaynames::provider::*;
use icu_locid::{subtags::Variant, ParserError};
use icu_provider::datagen::IterableDataProvider;
use icu_provider::prelude::*;
use std::collections::BTreeMap;
use std::str::FromStr;
Expand Down Expand Up @@ -35,8 +35,8 @@ impl DataProvider<VariantDisplayNamesV1Marker> for crate::DatagenProvider {
}
}

impl IterableDataProvider<VariantDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
impl IterableDataProviderInternal<VariantDisplayNamesV1Marker> for crate::DatagenProvider {
fn supported_locales_impl(&self) -> Result<Vec<DataLocale>, DataError> {
Ok(self
.cldr()?
.displaynames()
Expand Down
Loading