Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Intl: Automatically download ICU if requested. Reduce size. Keep --with-intl=none the default. #8719

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ ipch/
email.md
deps/v8-*
deps/icu
deps/icu*.zip
deps/icu*.tgz
deps/icu-tmp
./node_modules
.svn/

Expand Down
91 changes: 82 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,31 +83,104 @@ make doc
man doc/node.1
```

### To build `Intl` (ECMA-402) support:
### `Intl` (ECMA-402) support:

*Note:* more docs, including how to reduce disk footprint, are on
[Intl](https://github.com/joyent/node/wiki/Intl) support is not
enabled by default.

#### "small" (English only) support

This option will build with "small" (English only) support, but
the full `Intl` (ECMA-402) APIs. With `--download=all` it will
download the ICU library as needed.

Unix/Macintosh:

```sh
./configure --with-intl=small-icu --download=all
```

Windows:

```sh
vcbuild small-icu download-all
```

The `small-icu` mode builds
with English-only data. You can add full data at runtime.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually supported? I thought it was not currently supported, but that we planned to fix it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supported, and already landed in v0.12. That's why small icu links
insuch an odd way. If the "icu dir" option/env var is given, node_i18n.cc
doesn't call setcommondata and thus overriding happens (as icu is actually
linked against the stub data ).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, sounds good and thanks for the clarification!


*Note:* more docs are on
[the wiki](https://github.com/joyent/node/wiki/Intl).

#### Use existing installed ICU (Unix/Macintosh only):
#### Build with full ICU support (all locales supported by ICU):

With the `--download=all`, this may download ICU if you don't
have an ICU in `deps/icu`.

Unix/Macintosh:

```sh
pkg-config --modversion icu-i18n && ./configure --with-intl=system-icu
./configure --with-intl=full-icu --download=all
```

#### Build ICU from source:
Windows:

First: Unpack latest ICU
[icu4c-**##.#**-src.tgz](http://icu-project.org/download) (or `.zip`)
as `deps/icu` (You'll have: `deps/icu/source/...`)
```sh
vcbuild full-icu download-all
```

#### Build with no Intl support `:-(`

The `Intl` object will not be available.
This is the default at present, so this option is not normally needed.

Unix/Macintosh:

```sh
./configure --with-intl=full-icu
./configure --with-intl=none

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 Since this is the default, do we really need a separate option to build node without Intl support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness. If someone wants to verify that this is what they get, even if the default changes someday.
Could move out of the README and into the wiki.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense, but I'm fine with leaving it in the README. Maybe mention that it's the current default though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned.

```

Windows:

```sh
vcbuild intl-none
```

#### Use existing installed ICU (Unix/Macintosh only):

```sh
pkg-config --modversion icu-i18n && ./configure --with-intl=system-icu
```

#### Build with a specific ICU:

You can find other ICU releases at
[the ICU homepage](http://icu-project.org/download).
Download the file named something like `icu4c-**##.#**-src.tgz` (or
`.zip`).

Unix/Macintosh: from an already-unpacked ICU

```sh
./configure --with-intl=[small-icu,full-icu] --with-icu-source=/path/to/icu
```

Unix/Macintosh: from a local ICU tarball

```sh
./configure --with-intl=[small-icu,full-icu] --with-icu-source=/path/to/icu.tgz
```

Unix/Macintosh: from a tarball URL

```sh
./configure --with-intl=full-icu --with-icu-source=http://url/to/icu.tgz
```

Windows: first unpack latest ICU to `deps/icu`
[icu4c-**##.#**-src.tgz](http://icu-project.org/download) (or `.zip`)
as `deps/icu` (You'll have: `deps/icu/source/...`)

```sh
vcbuild full-icu
```
Expand Down
137 changes: 130 additions & 7 deletions configure
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,19 @@ import re
import shlex
import subprocess
import sys
import shutil
import string

CC = os.environ.get('CC', 'cc')

root_dir = os.path.dirname(__file__)
sys.path.insert(0, os.path.join(root_dir, 'tools', 'gyp', 'pylib'))
from gyp.common import GetFlavor

# imports in tools/configure.d
sys.path.insert(0, os.path.join(root_dir, 'tools', 'configure.d'))
import nodedownload

# parse our options
parser = optparse.OptionParser()

Expand Down Expand Up @@ -236,16 +242,31 @@ parser.add_option('--with-etw',
dest='with_etw',
help='build with ETW (default is true on Windows)')

parser.add_option('--download',
action='store',
dest='download_list',
help=nodedownload.help())

parser.add_option('--with-icu-path',
action='store',
dest='with_icu_path',
help='Path to icu.gyp (ICU i18n, Chromium version only.)')

parser.add_option('--with-icu-locales',
action='store',
dest='with_icu_locales',
help='Comma-separated list of locales for "small-icu". Default: "root,en". "root" is assumed.')

parser.add_option('--with-intl',
action='store',
dest='with_intl',
help='Intl mode: none, full-icu, small-icu (default is none)')

parser.add_option('--with-icu-source',
action='store',
dest='with_icu_source',
help='Intl mode: optional local path to icu/ dir, or path/URL of icu source archive.')

parser.add_option('--with-perfctr',
action='store_true',
dest='with_perfctr',
Expand Down Expand Up @@ -294,6 +315,8 @@ parser.add_option('--xcode',

(options, args) = parser.parse_args()

# set up auto-download list
auto_downloads = nodedownload.parse(options.download_list)

def b(value):
"""Returns the string 'true' if value is truthy, 'false' otherwise."""
Expand Down Expand Up @@ -712,6 +735,35 @@ def glob_to_var(dir_base, dir_sub):
return list

def configure_intl(o):
icus = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 With these latest additions, configure_intl is starting to become a bit bloated, and I'm concerned that it is already difficult to read and debug. I would like to see it slightly refactored in smaller logical chunks (functions). I don't know yet when would be the right time to do this. Now would certainly be ideal, but I don't want to make it prevent us from merging the rest. At the same time, I'd rather have it done sooner than later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created srl295/node#20 to track this

{
'url': 'http://download.icu-project.org/files/icu4c/54.1/icu4c-54_1-src.zip',
# from https://ssl.icu-project.org/files/icu4c/54.1/icu4c-src-54_1.md5:
'md5': '6b89d60e2f0e140898ae4d7f72323bca',
},
]
def icu_download(path):
# download ICU, if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud here: is this the right place to do the download? should it at least be in a separate utility file called by configure or even not called automatically at all and have configure complain at you if you haven't downloaded it? Seems slightly code-smelly to have a download taking place in here, but perhaps there are precedents already set inside configure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg didn't see any precedents here. It certainly could be factored out into a separate utility or at least imported function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg @trevnorris I guess I could add import tools.downloader to configure and then create <node>/tools/downloader.py right? or maybe import downloader from tools?

Glad to do this. Just don't know what the tradeoffs are time wise as far as v0.12's runway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what "standard protocol" is for downloading external resources. V8 tells you to run make dependencies if you try to just make w/o having downloaded the icu bundle. Maybe something similar to that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg I split out a module file tools/configure.d/nodedownload.py ( kept general in case there's more factoring done on configure ? ) - let me know how that looks.

@trevnorris seems like if it is on by default we might as well auto download, it seems the most generally user-friendly. You could also include a copy of icu .zip in node's .tgz which would save folks a step.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjfontaine Thoughts on how we want to proceed and what solution we want to use?

for icu in icus:
url = icu['url']
md5 = icu['md5']
local = url.split('/')[-1]
targetfile = os.path.join(root_dir, 'deps', local)
if not os.path.isfile(targetfile):
if nodedownload.candownload(auto_downloads, "icu"):
nodedownload.retrievefile(url, targetfile)
else:
print ' Re-using existing %s' % targetfile
if os.path.isfile(targetfile):
sys.stdout.write(' Checking file integrity with MD5:\r')
gotmd5 = nodedownload.md5sum(targetfile)
print ' MD5: %s %s' % (gotmd5, targetfile)
if (md5 == gotmd5):
return targetfile
else:
print ' Expected: %s *MISMATCH*' % md5
print '\n ** Corrupted ZIP? Delete %s to retry download.\n' % targetfile
return None
icu_config = {
'variables': {}
}
Expand All @@ -723,11 +775,11 @@ def configure_intl(o):
write(icu_config_name, do_not_edit +
pprint.pformat(icu_config, indent=2) + '\n')

# small ICU is off by default.
# always set icu_small, node.gyp depends on it being defined.
o['variables']['icu_small'] = b(False)

with_intl = options.with_intl
with_icu_source = options.with_icu_source
have_icu_path = bool(options.with_icu_path)
if have_icu_path and with_intl:
print 'Error: Cannot specify both --with-icu-path and --with-intl'
Expand All @@ -739,13 +791,26 @@ def configure_intl(o):
o['variables']['icu_gyp_path'] = options.with_icu_path
return
# --with-intl=<with_intl>
# set the default
if with_intl is None:
with_intl = 'none' # The default mode of Intl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 Same as above, do we really need to use a string to represent the default value for with_intl? Can't we just use None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make the logic more symmetrical for the none case, if non-none ever was the default.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense.

# sanity check localelist
if options.with_icu_locales and (with_intl != 'small-icu'):
print 'Error: --with-icu-locales only makes sense with --with-intl=small-icu'
sys.exit(1)
if with_intl == 'none' or with_intl is None:
o['variables']['v8_enable_i18n_support'] = 0
return # no Intl
elif with_intl == 'small-icu':
# small ICU (English only)
o['variables']['v8_enable_i18n_support'] = 1
o['variables']['icu_small'] = b(True)
with_icu_locales = options.with_icu_locales
if not with_icu_locales:
with_icu_locales = 'root,en'
locs = set(with_icu_locales.split(','))
locs.add('root') # must have root
o['variables']['icu_locales'] = string.join(locs,',')
elif with_intl == 'full-icu':
# full ICU
o['variables']['v8_enable_i18n_support'] = 1
Expand All @@ -769,20 +834,78 @@ def configure_intl(o):
# Note: non-ICU implementations could use other 'with_intl'
# values.

# this is just the 'deps' dir. Used for unpacking.
icu_parent_path = os.path.join(root_dir, 'deps')

# The full path to the ICU source directory.
icu_full_path = os.path.join(icu_parent_path, 'icu')

# icu-tmp is used to download and unpack the ICU tarball.
icu_tmp_path = os.path.join(icu_parent_path, 'icu-tmp')

# --with-icu-source processing
# first, check that they didn't pass --with-icu-source=deps/icu
if with_icu_source and os.path.abspath(icu_full_path) == os.path.abspath(with_icu_source):
print 'Ignoring redundant --with-icu-source=%s' % (with_icu_source)
with_icu_source = None
# if with_icu_source is still set, try to use it.
if with_icu_source:
if os.path.isdir(icu_full_path):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srl295 Probably a dubious use case, but if someone uses --with-icu-source=deps/icu, then it seems that it would not fail gracefully. Maybe we should not delete icu_full_path if icu_full_path == with_icu_source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • TODO good catch, it should detect that case. There's probably a pythonic way to check the paths.

print 'Deleting old ICU source: %s' % (icu_full_path)
shutil.rmtree(icu_full_path)
# now, what path was given?
if os.path.isdir(with_icu_source):
# it's a path. Copy it.
print '%s -> %s' % (with_icu_source, icu_full_path)
shutil.copytree(with_icu_source, icu_full_path)
else:
# could be file or URL.
# Set up temporary area
if os.path.isdir(icu_tmp_path):
shutil.rmtree(icu_tmp_path)
os.mkdir(icu_tmp_path)
icu_tarball = None
if os.path.isfile(with_icu_source):
# it's a file. Try to unpack it.
icu_tarball = with_icu_source
else:
# Can we download it?
local = os.path.join(icu_tmp_path, with_icu_source.split('/')[-1]) # local part
icu_tarball = nodedownload.retrievefile(with_icu_source, local)
# continue with "icu_tarball"
nodedownload.unpack(icu_tarball, icu_tmp_path)
# Did it unpack correctly? Should contain 'icu'
tmp_icu = os.path.join(icu_tmp_path, 'icu')
if os.path.isdir(tmp_icu):
os.rename(tmp_icu, icu_full_path)
shutil.rmtree(icu_tmp_path)
else:
print ' Error: --with-icu-source=%s did not result in an "icu" dir.' % with_icu_source
shutil.rmtree(icu_tmp_path)
sys.exit(1)

# ICU mode. (icu-generic.gyp)
byteorder = sys.byteorder
o['variables']['icu_gyp_path'] = 'tools/icu/icu-generic.gyp'
# ICU source dir relative to root
icu_full_path = os.path.join(root_dir, 'deps/icu')
o['variables']['icu_path'] = icu_full_path
if not os.path.isdir(icu_full_path):
print 'Error: ICU path is not a directory: %s' % (icu_full_path)
print '* ECMA-402 (Intl) support didn\'t find ICU in %s..' % (icu_full_path)
# can we download (or find) a zipfile?
localzip = icu_download(icu_full_path)
if localzip:
nodedownload.unpack(localzip, icu_parent_path)
if not os.path.isdir(icu_full_path):
print ' Cannot build Intl without ICU in %s.' % (icu_full_path)
print ' (Fix, or disable with "--with-intl=none" )'
sys.exit(1)
else:
print '* Using ICU in %s' % (icu_full_path)
# Now, what version of ICU is it? We just need the "major", such as 54.
# uvernum.h contains it as a #define.
uvernum_h = os.path.join(icu_full_path, 'source/common/unicode/uvernum.h')
if not os.path.isfile(uvernum_h):
print 'Error: could not load %s - is ICU installed?' % uvernum_h
print ' Error: could not load %s - is ICU installed?' % uvernum_h
sys.exit(1)
icu_ver_major = None
matchVerExp = r'^\s*#define\s+U_ICU_VERSION_SHORT\s+"([^"]*)".*'
Expand All @@ -792,7 +915,7 @@ def configure_intl(o):
if m:
icu_ver_major = m.group(1)
if not icu_ver_major:
print 'Could not read U_ICU_VERSION_SHORT version from %s' % uvernum_h
print ' Could not read U_ICU_VERSION_SHORT version from %s' % uvernum_h
sys.exit(1)
icu_endianness = sys.byteorder[0]; # TODO(srl295): EBCDIC should be 'e'
o['variables']['icu_ver_major'] = icu_ver_major
Expand All @@ -819,8 +942,8 @@ def configure_intl(o):
# this is the icudt*.dat file which node will be using (platform endianness)
o['variables']['icu_data_file'] = icu_data_file
if not os.path.isfile(icu_data_path):
print 'Error: ICU prebuilt data file %s does not exist.' % icu_data_path
print 'See the README.md.'
print ' Error: ICU prebuilt data file %s does not exist.' % icu_data_path
print ' See the README.md.'
# .. and we're not about to build it from .gyp!
sys.exit(1)
# map from variable name to subdirs
Expand Down
Loading