Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Date.parse #289

Merged
merged 8 commits into from
Mar 10, 2024
Merged

Improve Date.parse #289

merged 8 commits into from
Mar 10, 2024

Conversation

chqrlie
Copy link
Collaborator

@chqrlie chqrlie commented Mar 3, 2024

Improve Date.parse

  • rewrite Date.parse() with separate parsers for ISO and other formats
  • return NaN for out of bounds field values as specified
  • accept up to 9 decimals for millisecond fraction but truncate at 3
  • accept many more alternative date/time formats
  • add test cases in tests/test_builtin.js
  • fix most v8 test cases

- produce readable output for Date objects in repl
- rewrite Date.parse() with separate parsers
- return `NaN` for out of bounds field values as specified
- accept up to 9 decimals for millisecond fraction but truncate at 3
- accept many more alternative date/time formats
- add test cases in tests/test_builtin.js
@chqrlie chqrlie changed the title Improve date parse Improve Date.parse Mar 3, 2024
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM but I believe the spec allows for some implementation-dependentness and we should decide whether we want to align with V8 (laxer) or SM (stricter.)

Although I don't generally subscribe to Postel's maxim, I'm inclined to side with V8 here because date strings are often entered by humans, not machines.

repl.js Outdated
@@ -938,6 +938,8 @@ import * as os from "os";
std.puts(a);
} else if (stack.indexOf(a) >= 0) {
std.puts("[circular]");
} else if (a instanceof Date) {
std.puts("Date " + a.toGMTString().__quote());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in my never-ceasing purge of everything and anything non-standard, this method had so far escaped my notice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a toSource, which albeit non-standard is available on some systems and less surprising. For the Date object output, I suggest std.puts(`Date "${a.toGMTString()}"`);

Copy link
Contributor

@bnoordhuis bnoordhuis Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not format the string here, in place? I'm not a huge fan of swapping one non-standard method for another. toSource() was a Mozilla-ism, as far as I'm aware, and even they removed it.

edit: doesn't need to hold up this pull request since __quote is a pre-existing thing.

Copy link
Collaborator Author

@chqrlie chqrlie Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not format the string here, in place?

I used std.puts(`Date "${a.toGMTString()}"`); in the latest commit. Is this what you meant?
How to you produce the source for a string reliably without some kind of toSource() method?
I guess I could write it in JS and add it in repl.js or another package loaded by repl.js.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I mean, doing it inside repl.js.

Copy link
Collaborator Author

@chqrlie chqrlie Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it in Javascript, then as I proceeded to remove String.prototype.__quote, I realized JSON.stringify does the job. Patch submitted.

quickjs.c Show resolved Hide resolved
quickjs.c Show resolved Hide resolved
quickjs.c Show resolved Hide resolved
quickjs.c Show resolved Hide resolved
}

/* parse toISOString format */
static BOOL js_date_parse_isostring(const uint8_t *sp, int fields[9], BOOL *is_local) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static BOOL js_date_parse_isostring(const uint8_t *sp, int fields[9], BOOL *is_local) {
static BOOL js_date_parse_isostring(const uint8_t *sp, int (*fields)[9], BOOL *is_local) {

That way callers must pass in an exactly nine-element array (and you have to dereference them here as (*fields)[i])

Another alternative is C99-style int fields[static 9] syntax. Then callers must pass in an at least nine-element array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I don't like the C99 syntax too much. Does MSAN or ASAN catch out of bound accesses for arrays passed as pointers (without a length specified by either syntax)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time. Not 100% of the time though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a macro to encapsulate the C99 syntax and used that. It is less surprising to casual readers than the array pointer trick, and also allows for larger arrays if necessary, as is the case for get_date_fields() / set_date_fields() combinations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to disable the C99 syntax for MSVC that does not support it even though it does define __STDC_VERSION__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int (*fields)[9] syntax works in msvc; it's a c89-ism that we use quite extensively in libuv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int (*fields)[9] syntax works in msvc; it's a c89-ism that we use quite extensively in libuv.

Of course, but this syntax would force the length to be 9 whereas get_date_fields needs at least 9 and set_date_fields at least 7. It is not a problem if MSVC cannot verify the minimum length as other compilers already do for 99,9% of the code.

quickjs.c Show resolved Hide resolved
return FALSE;
}
}
if (string_skip_char(sp, &p, 'T')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my other comment: V8 accepts lowercase 't', spidermonkey doesn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T and Z are specified in uppercase for the ISO date/time format. Matching these in lowercase too does not seem necessary, but could become a side effect of the string conversion if I were to uppercase the string unconditionally. I am not sure it is worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally lean on the side of acceptance and forgiveness here but you're not wrong it's per spec so 🤷

quickjs.c Outdated
Comment on lines 47587 to 47588
} else
if (c == '(') { /* skip parenthesized phrase */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else
if (c == '(') { /* skip parenthesized phrase */
} else if (c == '(') { /* skip parenthesized phrase */

Also, I feel slightly offended that this list leaves out CET and CEST :-)

Copy link
Collaborator Author

@chqrlie chqrlie Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of style... I like to align the if tests at the indentation level. I will check if the rest of the code is consistent with that... else if on the same line wins 8 to 1, yet for long lists of chained tests, I find it more readable to break the line between the else and the if.

I would gladly add more timezone abbreviations and use a table. Do V8 or SM support other ones?
I found this source: https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations but it is long and some of the abbreviations are ambiguous.

This site also has a trove of useful information: https://www.timeanddate.com/time/zones/

I wrote a more general parser and added the European zones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do V8 or SM support other ones?

Both SM's and V8's parser go well above and beyond the minimum functionality the spec mandates.

They both know how to parse e.g. (Central European Standard Time) (and in a case-insensitive manner); quite possibly they lean into ICU for the actual string parsing.

quickjs.c Outdated
Comment on lines 47599 to 47600
} else
if (c == ')') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else
if (c == ')') {
} else if (c == ')') {

- add `minimum_length` macro to specify argument array sizes
- improve `string_get_milliseconds` readability
- fix `upper_ascii`
- add `js_tzabbr` and `string_get_tzabbr` to handle timezone abbreviations
- v8.js: parse all environement variables and output them
- v8.txt: add environement variables when specified
- no longer use `String.prototype.__quote()` to output `Date` objects in repl
- disable C99 `minimum_length()` syntax for MSVC
v8.js Outdated
Comment on lines 39 to 46
let env = {}, envstr = "";
for (let s of source.matchAll(/environment variables:(.+)/ig)) {
for (let m of s[1].matchAll(/\s*([\S]+)=([\S]+)/g)) {
env[m[1]] = m[2];
envstr += ` ${m[1]}=${m[2]}`;
}
}
print(`=== ${file}${envstr}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let env = {}, envstr = "";
for (let s of source.matchAll(/environment variables:(.+)/ig)) {
for (let m of s[1].matchAll(/\s*([\S]+)=([\S]+)/g)) {
env[m[1]] = m[2];
envstr += ` ${m[1]}=${m[2]}`;
}
}
print(`=== ${file}${envstr}`);
let env = {}, envstr = ""
for (let s of source.matchAll(/environment variables:(.+)/ig)) {
for (let m of s[1].matchAll(/\s*([\S]+)=([\S]+)/g)) {
env[m[1]] = m[2]
envstr += ` ${m[1]}=${m[2]}`
}
}
print(`=== ${file}${envstr}`)

No semicolons in my beautiful pristine JS code, please :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for consistency, but not my style :)

@saghul
Copy link
Contributor

saghul commented Mar 9, 2024

@chqrlie is this ready to go?

I'm planning on cutting a 0.4.0 release once this lands.

@chqrlie
Copy link
Collaborator Author

chqrlie commented Mar 9, 2024

@chqrlie is this ready to go?

I'm planning on cutting a 0.4.0 release once this lands.

Yes. just waiting for Ben's final approval

- use JSON.stringify to output Date and string values in repl.js
- remove String.prototype.__quote
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only briefly skimmed it this time because, well, it's 10 AM on a Sunday morning and I want to go outside and play with the kids, but LGTM :)

@chqrlie chqrlie merged commit 648a8f5 into quickjs-ng:master Mar 10, 2024
36 checks passed
@chqrlie chqrlie deleted the improve-date-parse branch March 10, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants