Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add test to verify ErrnoException path #13958

Closed
wants to merge 2 commits into from

Conversation

danbev
Copy link
Contributor

@danbev danbev commented Jun 28, 2017

This commit adds a test to verify that the path argument to
ErrnoException can contain UTF-8 characters.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

src

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jun 28, 2017
@addaleax addaleax added errors Issues and PRs related to JavaScript errors originated in Node.js core. semver-major PRs that contain breaking changes and should be released in the next major version. labels Jun 28, 2017
@addaleax
Copy link
Member

I don’t think this is a good idea. Always interpreting it as Latin-1 instead of always interpreting it as UTF-8 isn’t going to help anyone; and at least on Unices, it’s somewhat reasonable to assume UTF-8 as the default, even if there is no guarantee that the path is actually UTF-8-encoded. (Not sure about Windows but I think for consistency there would have to be something (C runtime?) that transcodes to UTF-8 for us as well.)

I’d suggest dropping the FIXME(bnoordhuis) bit in the comment and maybe adding that there isn’t really any better option.

/cc @bnoordhuis

@danbev
Copy link
Contributor Author

danbev commented Jun 28, 2017

@addaleax Thanks, I was not sure about this and like you suggested perhaps dropping FIXME and commenting would be good here.

@addaleax
Copy link
Member

@danbev It might also be good to have a test for this – if tests worked with your current patch, we don’t have one. Something like checking the error thrown by fs.stat[Sync]('nönexistent') might suffice.

@danbev
Copy link
Contributor Author

danbev commented Jun 28, 2017

Something like checking the error thrown by fs.stat[Sync]('nönexistent') might suffice.

I tried that and that succeeds as it looks like that would go through the UVException function which uses UTF-8. I'll look into how a test for this can be written.

src/node.cc Outdated
@@ -877,8 +877,7 @@ Local<Value> ErrnoException(Isolate* isolate,

Local<String> path_string;
if (path != nullptr) {
// FIXME(bnoordhuis) It's questionable to interpret the file path as UTF-8.
path_string = String::NewFromUtf8(env->isolate(), path);
path_string = FIXED_ONE_BYTE_STRING(env->isolate(), path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, there are edge cases where even this is going to be inadequate. Those are the same edge cases that led me to add the ability to pass Buffer instances to fs APIs... specifically, on posix systems, it's possible for a path string to contain multiple encodings... for instance, parent directory in ASCII, directory in Windows-1250, UTF-8 filename, etc. Unfortunately it's not theoretical either... actually had this happen.

May be worthwhile deprecating path_string and moving to a Buffer instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still an improvement of sorts. Any byte string is a valid Latin-1 string (although not per se a meaningful Latin-1 string) whereas conversion to UTF-8 is lossy - it's not bidirectional when the byte string contains invalid UTF-8 character sequences because those get replaced with U+FFFD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis I still think this change would be more confusing than helpful, given that Node defaults to UTF-8 in all places by now, encountering Latin-1 encoded paths is already pretty rare and won’t get more frequent over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree and to be clear, I don't advocate making changes haphazardly, that wouldn't help.

Aside (and for background): node's use of UTF-8 for file paths is accidental, it wasn't a design choice. V8 originally only supported UTF-8, one-byte strings weren't added until much later.

@danbev danbev force-pushed the error-fixed-one-byte-string branch from 0592a8a to 2fc1bb7 Compare June 30, 2017 07:44
const err = binding.errno();
assert.strictEqual(err.syscall, 'syscall');
assert.strictEqual(err.errno, 10);
assert.strictEqual(err.path, 'päth');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a tiny suggestion, maybe check err.message as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes :) Thanks

@danbev danbev force-pushed the error-fixed-one-byte-string branch from b83d30e to 8de6e76 Compare August 9, 2017 06:28
@danbev danbev changed the title src: make path_string FIXED_ONE_BYTE_STRING test: add test to verify ErrnoException path Aug 9, 2017
@danbev
Copy link
Contributor Author

danbev commented Aug 9, 2017

@danbev
Copy link
Contributor Author

danbev commented Aug 9, 2017

test/arm failure looks unrelated

console output:

out/Release/cctest --gtest_output=tap:cctest.tap
[==========] Running 48 tests from 6 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from Base64Test
[ RUN      ] Base64Test.Encode
[       OK ] Base64Test.Encode (0 ms)
[ RUN      ] Base64Test.Decode
[       OK ] Base64Test.Decode (1 ms)
[----------] 2 tests from Base64Test (1 ms total)

[----------] 2 tests from EnvironmentTest
[ RUN      ] EnvironmentTest.AtExitWithEnvironment
[       OK ] EnvironmentTest.AtExitWithEnvironment (31 ms)
[ RUN      ] EnvironmentTest.AtExitWithArgument
Received signal 11 SEGV_MAPERR 000007e966dc

==== C stack trace ===============================

[end of stack trace]

@danbev
Copy link
Contributor Author

danbev commented Aug 9, 2017

@addaleax Just wanted to ask if your approve still holds? I've rebased and reworded the commit message to reflect that this commit is now only adding a test and nothing else.

Regarding the failure arm failure I'm going to take a closer look at it now.

Copy link
Member

@tniessen tniessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself LGTM.

Copy link
Member

@gibfahn gibfahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo whitespace/comment nit.

const assert = require('assert');
const binding = require(`./build/${common.buildType}/binding`);
const err = binding.errno();
assert.strictEqual(err.syscall, 'syscall');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit time (from the guide)...

'use strict';

const common = require('../../common');

// Verify that the path argument to node::ErrnoException() can contain UTF-8
// characters.

const assert = require('assert');
const binding = require(`./build/${common.buildType}/binding`);
const err = binding.errno();

assert.strictEqual(err.syscall, 'syscall');
assert.strictEqual(err.errno, 10);
assert.strictEqual(err.path, 'päth');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll update this.

This commit adds a test to verify that the path argument to
ErrnoException can contain UTF-8 characters.
@danbev danbev force-pushed the error-fixed-one-byte-string branch from 2248040 to 32a3027 Compare August 13, 2017 06:53
@danbev
Copy link
Contributor Author

danbev commented Aug 13, 2017

Copy link
Member

@gibfahn gibfahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks

danbev added a commit to danbev/node that referenced this pull request Aug 14, 2017
This commit adds a test to verify that the path argument to
ErrnoException can contain UTF-8 characters.

PR-URL: nodejs#13958
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
@danbev
Copy link
Contributor Author

danbev commented Aug 14, 2017

Landed in 95c8df1

@danbev danbev closed this Aug 14, 2017
@danbev danbev deleted the error-fixed-one-byte-string branch August 14, 2017 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. errors Issues and PRs related to JavaScript errors originated in Node.js core. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants