Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: deps: checkin a full ICU #21676

Closed
wants to merge 2 commits into from
Closed

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Jul 5, 2018

🚧

To be based on #21452

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • documentation is changed or added
  • commit message follows commit guidelines

@srl295 srl295 added the i18n-api Issues and PRs related to the i18n implementation. label Jul 5, 2018
@srl295 srl295 self-assigned this Jul 5, 2018
@nodejs-github-bot nodejs-github-bot added the build Issues and PRs related to build files or the CI. label Jul 5, 2018
@srl295
Copy link
Member Author

srl295 commented Jul 5, 2018

Main discussion: #19214

@srl295
Copy link
Member Author

srl295 commented Jul 5, 2018

linter works locally for me

Edit: missed the "markdown linter not installed" thing..

@srl295
Copy link
Member Author

srl295 commented Jul 5, 2018

I renamed the icu directory from deps/icu-small to deps/icu-full to keep from there being any confusion about what is checked in. I guess it would reduce churn a bit to keep the same name and just use deps/icu-small for the full ICU data.

Copy link
Member

@devsnek devsnek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubber stamp lgtm

Copy link
Member

@richardlau richardlau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the icu directory from deps/icu-small to deps/icu-full to keep from there being any confusion about what is checked in. I guess it would reduce churn a bit to keep the same name and just use deps/icu-small for the full ICU data.

+1 for renaming, although you will also need to update tools/license-builder.sh and doc/api/intl.md#options-for-building-nodejs

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just use the name deps/icu? The full part won't age well, and the checkout is not technically full since unnecessary files are removed.


You are ready to check in the updated `deps/small-icu`. This is a big commit,
You are ready to check in the updated `deps/full-icu`. This is a big commit,
Copy link
Member

@TimothyGu TimothyGu Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icu-full or just icu, as I recommended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you do configure --with-icu-source=somewhere/ssomewhere/icu4c-62_1-src.tgz then deps/icu gets created with the contents of the specified tarball. so deps/icu is already taken, and expected to be transient.

We could also just call it small-icu - it is smaller (trimmed out unneeded stuff), just not small in data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in that case I'd rather rename the transient directory if that's not too much work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ok, i misread this , and i implemented renaming the modified source to dep/icu-small (even though it has all locales, just smaller source)

Copy link
Member Author

@srl295 srl295 Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • will fix this up

@srl295 srl295 force-pushed the full-icu-62 branch 2 times, most recently from 56f5608 to 28ca326 Compare July 6, 2018 01:27
@@ -38,26 +38,25 @@ in [BUILDING.md][].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier in this file there's a section:

Node.js (and its underlying V8 engine) uses ICU to implement these features in native C/C++ code. However, some of them require a very large ICU data file in order to support all locales of the world. Because it is expected that most Node.js users will make use of only a small portion of ICU functionality, only a subset of the full ICU data set is provided by Node.js by default. Several options are provided for customizing and expanding the ICU data set either when building or running Node.js.

The underlined portion should probably be edited.

Copy link
Member Author

@srl295 srl295 Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, you could still build ICU with English only, but you'd have to do your own ICU build to get it. Such tools should live in ICU-land, not in Node.

@silverwind
Copy link
Contributor

How about we just use the name deps/icu

That'd be my preference too, if it's not too much of a hassle.

Which semver tag should we give this PR? I'm leaning towards semver-minor as full-icu is supposed to be a superset of small-icu, but it could also be seen as breaking as we remove a build option.

@@ -78,7 +78,7 @@ UnhandledEngine::findBreaks( UText *text,
int32_t /* startPos */,
int32_t endPos,
UVector32 &/*foundBreaks*/ ) const {
UChar32 c = utext_current32(text);
UChar32 c = utext_current32(text);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of odd that this ICU bump contains quite a lot of trailing whitespace changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happens every.single.time. I usually locally re-land with --fix-whitespace=all and then push the landed version back to the branch, but i messed something up in the process this time.

upstream ICU still has trailing whitespace.

doc/api/intl.md Outdated
If the `small-icu` option is used, one can still provide additional locale data
at runtime so that the JS methods would work for all ICU locales. Assuming the
If the `full-icu` option is used, one can still provide additional data
at runtime. (This data does not override the built in data.) Assuming the
Copy link
Contributor

@silverwind silverwind Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"built-in"

BUILDING.md Outdated
```console
> .\vcbuild full-icu download-all
```
enabled by default. The `small-icu` option has been removed.
Copy link
Contributor

@silverwind silverwind Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The previously available small-icu option has been removed"

@srl295
Copy link
Member Author

srl295 commented Jul 6, 2018

@silverwind

it could also be seen as breaking as we remove a build option.

You know, perhaps I should accept --with-intl=small-icu as an alias for full-icu.

Also, in intl.md I will add a note that you can customize ICU itself. I don't have great docs
to point to here, but the node docs already link to ICU's docs under references.

@vsemozhetbyt
Copy link
Contributor

It seems this section also needs updating (particularly, the "Encodings Supported by Default (With ICU)" part):

https://nodejs.org/download/nightly/v11.0.0-nightly201807060ef04b8133/docs/api/util.html#util_whatwg_supported_encodings

@silverwind
Copy link
Contributor

perhaps I should accept --with-intl=small-icu as an alias for full-icu.

Sounds good. I'd make sure to print a warning that the option was removed and full-icu is assumed. That way, we can avoid the major bump.

@silverwind silverwind added the semver-minor PRs that contain new features and should be released in the next minor version. label Jul 6, 2018
doc/api/intl.md Outdated
### `small-icu`

Support for `small-icu` (English only) has been removed. See ICU’s
documentation for information on how to customize your ICU build.
Copy link
Member Author

@srl295 srl295 Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 srl295 mentioned this pull request Jul 9, 2018
2 tasks
@srl295 srl295 changed the title deps: bump full icu 62.1 🚧 deps: bump full icu 62.1 Jul 9, 2018
@srl295 srl295 changed the title 🚧 deps: bump full icu 62.1 WIP: deps: bump full icu 62.1 Jul 9, 2018
@srl295 srl295 added the wip Issues and PRs that are still a work in progress. label Jul 9, 2018
@srl295 srl295 force-pushed the full-icu-62 branch 2 times, most recently from 1310723 to 8732b12 Compare July 9, 2018 21:08
@srl295 srl295 changed the title WIP: deps: bump full icu 62.1 WIP: deps: checkin a full ICU Jul 9, 2018
@srl295
Copy link
Member Author

srl295 commented Jul 9, 2018

ok. a lot of churn due to renaming icu-small to icu, but should be a one time…

- remove mechanism for 'small' icu
- fix license-builder
- remove a copy step in icu-generic
- update docs

Fixes: nodejs#19214
Copy link
Contributor

@silverwind silverwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I wouldn't go with LFS, it would only introduce additional complexity, and the files aren't that big after all.

@srl295
Copy link
Member Author

srl295 commented Jul 11, 2018

ok- i talked with @mhdawson a bit, and we are going to explore #3460 a little bit more. Putting this on hold- but at least, i think it shows a viable option.

@vsemozhetbyt
Copy link
Contributor

Is it worth to consider this as Node.js 11 feature?

@srl295
Copy link
Member Author

srl295 commented Oct 3, 2018

@vsemozhetbyt could be, but needs some consensus on direction. (NB this PR could be updated to 63.1 after #23244 lands )

@refack
Copy link
Contributor

refack commented Oct 3, 2018

What are the benefits of this over the current setup? AFAICT we could set --with-intl=full-icu for the release builds, and keep both the repo and local dev builds as is?

@vsemozhetbyt
Copy link
Contributor

Refs: #24039

@Trott
Copy link
Member

Trott commented Nov 25, 2018

@srl295 Is this still a thing that's being worked on?

@srl295
Copy link
Member Author

srl295 commented Nov 26, 2018

@refack The benefit is simplifying (though increasing the size of) what's in the repo.

@Trott Not actively. It sounds like there's some preference for
#3460

@BridgeAR
Copy link
Member

Closing due to long inactivity and it seems like there is preference for a different approach. @srl295 please reopen / open a new PR in case you would like to work on this again.

@BridgeAR BridgeAR closed this Mar 10, 2019
@refack refack added the stalled Issues and PRs that are stalled. label Mar 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. i18n-api Issues and PRs related to the i18n implementation. semver-minor PRs that contain new features and should be released in the next minor version. stalled Issues and PRs that are stalled. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.