Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spec Deviation] Support StringNumericEscape #13180

Closed
MaryamZi opened this issue Jan 15, 2019 · 5 comments · Fixed by #20714
Closed

[Spec Deviation] Support StringNumericEscape #13180

MaryamZi opened this issue Jan 15, 2019 · 5 comments · Fixed by #20714
Assignees
Labels
Area/Compiler Points/1 Equivalent to one day effort Priority/High Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. Type/NewFeature Type/SpecDeviation

Comments

@MaryamZi
Copy link
Member

MaryamZi commented Jan 15, 2019

Description:

Updated 2020R1 Spec:

The syntax for Unicode escapes in strings has changed from \u[CodePoint] to \u{CodePoint} so as to align with ECMAScript. Although this is an incompatible change, the previous syntax was not implemented.

Related commit - ballerina-platform/ballerina-spec@eb2fa56

------Old Content-------

From 2019R1 Spec :

As per the spec:

...
string-literal :=
DoubleQuotedStringLiteral | symbolic-string-literal
DoubleQuotedStringLiteral := " (StringChar | StringEscape)* "
...
StringEscape := StringSingleEscape | StringNumericEscape
...
StringNumericEscape := \u[ CodePoint ]
CodePoint := HexDigit+

"In a StringNumericEscape, CodePoint must valid Unicode code point; more precisely, it
must be a hexadecimal numeral denoting an integer n where 0 <= n < 0xD800 or 0xDFFF <
n <= 0x10FFFF."

Following needs to be supported:

string s1 = "\u0";
string s2 = "\uD799";
string s3 = "\uEFFF";
string s4 = "\u10FFFF";
@rdhananjaya
Copy link
Member

Current compiler implementation uses apache commons' StringEscapeUtils.unescapeJava method to unescape unicode literals and it only support \uXXXX (where X is a hex digit).

Since our current supported range is a subset of allowed range and using this supported range would not impede future expansion of this range, shall we lower the priority of this task.

@anupama-pathirage anupama-pathirage added the Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. label Apr 30, 2019
@gimantha gimantha added this to the Ballerina 1.1.0 milestone Oct 17, 2019
@hasithaa hasithaa removed this from the Ballerina 1.1.0 milestone Dec 16, 2019
@hasithaa
Copy link
Contributor

Related to ballerina-platform/ballerina-spec#390

@jclark
Copy link

jclark commented Jan 17, 2020

You have misunderstood the spec: there is a literal [ and ] in there. It's \u[D799] not \uD799. See https://ballerina.io/spec/lang/2019R3/#StringNumericEscape

Anyway, we decided to change to e.g. \u{D799} for compatibility with ECMAScript.

@jclark
Copy link

jclark commented Jan 17, 2020

If you are currently supporting \uXXXX syntax, you should up the priority of this, since that is not the right syntax.

@hasithaa
Copy link
Contributor

@jclark yes. It seems like a bug and we support \uXXXX syntax. I have increased priority for this issue.

@MaryamZi MaryamZi added the Points/1 Equivalent to one day effort label Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Compiler Points/1 Equivalent to one day effort Priority/High Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. Type/NewFeature Type/SpecDeviation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants