From cc7e7c8b6787a8bc90a841e93a55793e221c1949 Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Sun, 18 Jun 2023 17:56:54 +0100 Subject: [PATCH 1/8] Propose code string literals --- text/0000-code-literals.md | 268 +++++++++++++++++++++++++++++++++++++ 1 file changed, 268 insertions(+) create mode 100644 text/0000-code-literals.md diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md new file mode 100644 index 00000000000..cdd988ea5cf --- /dev/null +++ b/text/0000-code-literals.md @@ -0,0 +1,268 @@ +- Feature Name: code_literals +- Start Date: 2023-06-18 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Add a new kind of multi-line string literal for embedding code which +plays nicely with `rustfmt`. + +# Motivation +[motivation]: #motivation + + - Embedding code as a literal string within a Rust program is often + necessary. A prominent example is the `sqlx` crate, which + has the user write SQL queries as string literals within the program. + - Rust already supports several kinds of multi-line string literal, + but none of them are well suited for embedding code. + + 1. Normal string literals, eg. `"a string literal"`. These can be + written over multiple lines, but require special characters + to be escaped. Whitespace is significant within the literal, + which means that `rustfmt` cannot fix the indentation of the + code block. For example, beginning with this code: + + ```rust + if some_condition { + do_something_with( + " + a nicely + indented code + string + " + ); + } + ``` + + If the indentation is changed, such as by removing the + conditional, then `rustfmt` must re-format the code like so: + + ```rust + do_something_with( + " + a nicely + indented code + string + " + ); + ``` + + To do otherwise would be to change thange the value of + the string literal. + + 2. Normal string literals with backslash escaping, eg. + ```rust + " + this way\ + whitespace at\ + the beginning\ + of lines can\ + be ignored\ + " + ``` + + This approach still suffers from the need to escape special + characters. The backslashes at the end of every line are + tedious to write, and are problematic if whitespace is + meaningful within the code. For example, if python code + was being embedded, then the indentation would be lost. + Finally, although `rustfmt` could in principle reformat + these strings, in practice doing so in a reasonable way + is complicated and so this has never been enabled. + + 3. Raw string literals, eg. `r#"I can use "s!"#` + + This solves the problem of special characters, but suffers + from the same inability to be reformatted, and the trick + of using an `\` at the end of each line cannot be applied + because escape characters are not recognised. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +In addition to string literals and raw string literals, a third type +of string literal exists: code string literals. + +```rust + let code = ``` + This is a code string literal + + I can use special characters like "" and \ freely. + + Indentation is preserved *relative* to the indentation level + of the first line. + + It is an error for a line to have "negative" indentation (ie. be + indented less than the indentation of the opening backticks) unless + the line is empty. + ```; +``` + +`rustfmt` will automatically adjust the indentation of the code string +literal as a whole to match the surrounding context, but will never +change the relative indentation within such a literal. + +Anything directly after the opening backticks is not considered +part of the string literal. It may be used as a language hint or +processed by macros (similar to the treatment of doc comments). + +```rust +let sql = ```sql + SELECT * FROM table; + ```; +``` + +Similar to raw string literals, there is no way to escape characters +within a code string literal. It is expected that procedural macros +would build upon code string literals to add support for such +functionality as required. + +If it is necessary to include triple backticks within a code string +literal, more than three backticks may be used to enclose the +literal, eg. + +```rust +let code = ```` + ``` +````; +``` + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +A code string literal will begin and end with three or more backticks. +The number of backticks in the terminator must match the number used +to begin the literal. + +The value of the string literal will be determined using the following +steps: + +1. Start from the first newline after the opening backticks. +2. Take the string exactly as written until the closing backticks. +3. Remove equal numbers of spaces or tabs from every non-empty line + until the first character of the first non-empty line is neither + a space nor a tab, or until every line is empty. + Raise a compile error if this could not be done + due to a "negative" indent or inconsistent whitespace (eg. if + some lines are indented using tabs and some using spaces). + +Here are some edge case examples: + +```rust + // Empty string + assert_eq!(```foo + ```, ""); + + // Newline + assert_eq!(``` + + ```, "\n"); + + // No terminating newline + assert_eq!(``` + bar```, "bar"); + + // Terminating newline + assert_eq!(``` + bar + ```, "bar\n"); + + // Preserved indent + assert_eq!(``` + if a: + print(42) + ```, "if a:\n print(42)\n"); + + // Relative indent + assert_eq!(``` + if a: + print(42) + ```, "if a:\n print(42)\n"); + + // Relative to first non-empty line + assert_eq!(``` + + + if a: + print(42) + ```, "\n\nif a:\n print(42)\n"); +``` + +The text between the opening backticks and the first newline is +preserved within the AST, but is otherwise unused. + +# Drawbacks +[drawbacks]: #drawbacks + +The main drawback is increased complexity of the language: + +1. It adds a new symbol to the language, which was not previously used. +2. It adds a third way of writing string literals. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +There is lots of room to bike-shed syntax. +If there is significant opposition to the backtick syntax, then an +alternative syntax such as: +``` +code" + string +" +``` +could be used. + +Similarly, the use of more than three backticks may be unpopular. +It's not clear how important it is to be able to nest backticks +within backticks, but a syntax mirroring raw string literals could +be used instead, eg. +``` +`# foo + string +#` +``` + +There is also the question of whether the backtick syntax would +interfere with the ability to paste Rust code snippets into such +blocks. Experimentally, markdown parsers do not seem to have any +problems with this (as demonstrated in this document). + +# Prior art +[prior-art]: #prior-art + +The proposed syntax is primarily based on markdown code block syntax, +which is widely used and should be familiar to most programmers. + + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- None + +# Future possibilities +[future-possibilities]: #future-possibilities + +- Macro authors could perform further processing + on code string literals. These macros could add support for string + interpolation, escaping, etc. without needing to further complicate + the language itself. + +- Procedural macros could look at the text following the opening triple + quotes and use that to influence code generation, eg. + + ```rust + query!(```postgresql + + ```) + ``` + + could parse the query in a PostgreSQL specific way. + +- Code literals could be used by crates like `html-macro` + or `quote` to provide better surface syntax and faster + compilation. + +- Code literals could be used with the `asm!` macro to avoid + needing a new string on every line. From a48ef56397015137e5a12a6358f9f95c9bf32ec9 Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Sun, 18 Jun 2023 19:58:04 +0100 Subject: [PATCH 2/8] Update text/0000-code-literals.md Co-authored-by: Caleb Cartwright --- text/0000-code-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index cdd988ea5cf..6d24c33bbab 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -70,7 +70,7 @@ plays nicely with `rustfmt`. was being embedded, then the indentation would be lost. Finally, although `rustfmt` could in principle reformat these strings, in practice doing so in a reasonable way - is complicated and so this has never been enabled. + is complicated and so this has never been enabled by default. 3. Raw string literals, eg. `r#"I can use "s!"#` From f889c3eca9214aaf0b4a3c02bcef467ace0bd1dd Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Tue, 20 Jun 2023 22:10:26 +0100 Subject: [PATCH 3/8] Incorporate feedback and suggestions --- text/0000-code-literals.md | 182 ++++++++++++++++++++++++++++++++----- 1 file changed, 161 insertions(+), 21 deletions(-) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index cdd988ea5cf..ab3fc19429a 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -7,7 +7,8 @@ [summary]: #summary Add a new kind of multi-line string literal for embedding code which -plays nicely with `rustfmt`. +plays nicely with `rustfmt` and doesn't introduce unwanted whitespace +into multi-line string literals. # Motivation [motivation]: #motivation @@ -79,6 +80,12 @@ plays nicely with `rustfmt`. of using an `\` at the end of each line cannot be applied because escape characters are not recognised. + - The existing string literals introduce extra unwanted whitespace + into the literal value. Even if that extra whitespace does not + semantically affect the nested code, it results in ugly output + if the code is ever logged (such as might happen when logging + SQL query executions). + # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -92,10 +99,10 @@ of string literal exists: code string literals. I can use special characters like "" and \ freely. Indentation is preserved *relative* to the indentation level - of the first line. + of the terminating triple backticks. It is an error for a line to have "negative" indentation (ie. be - indented less than the indentation of the opening backticks) unless + indented less than the final triple backticks) unless the line is empty. ```; ``` @@ -129,6 +136,15 @@ let code = ```` ````; ``` +In order to suppress the final newline, the literal may instead be +closed with `!``` `, eg. + +```rust +let code = ``` + Text with no final newline + !```; +``` + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -139,35 +155,37 @@ to begin the literal. The value of the string literal will be determined using the following steps: -1. Start from the first newline after the opening backticks. -2. Take the string exactly as written until the closing backticks. -3. Remove equal numbers of spaces or tabs from every non-empty line - until the first character of the first non-empty line is neither - a space nor a tab, or until every line is empty. - Raise a compile error if this could not be done - due to a "negative" indent or inconsistent whitespace (eg. if - some lines are indented using tabs and some using spaces). +1. Measure the whitespace indenting the closing backticks. If a + non-whitespace character (other than a single `!`) exists before + the closing backticks on the same line, then issue a compiler error. +2. Take the lines *between* (but not including) the opening and + closing backticks exactly as written. +3. Remove exactly the measured whitespace from each line. If this + cannot be done, then issue a compiler error. +4. If the string was terminated with `!``` `, then remove the + final newline. Here are some edge case examples: ```rust // Empty string assert_eq!(```foo - ```, ""); + ```, ""); // Newline assert_eq!(``` - ```, "\n"); + ```, "\n"); // No terminating newline assert_eq!(``` - bar```, "bar"); + bar + !```, "bar"); // Terminating newline assert_eq!(``` bar - ```, "bar\n"); + ```, "bar\n"); // Preserved indent assert_eq!(``` @@ -179,15 +197,15 @@ Here are some edge case examples: assert_eq!(``` if a: print(42) - ```, "if a:\n print(42)\n"); + ```, "if a:\n print(42)\n"); - // Relative to first non-empty line + // Relative to closing backticks assert_eq!(``` if a: print(42) - ```, "\n\nif a:\n print(42)\n"); + ```, "\n\n if a:\n print(42)\n"); ``` The text between the opening backticks and the first newline is @@ -229,11 +247,133 @@ interfere with the ability to paste Rust code snippets into such blocks. Experimentally, markdown parsers do not seem to have any problems with this (as demonstrated in this document). +## A list of all options regarding syntax + +### Quote style + + - **3+N backticks** + ```rust + let _ = ``` + some code + ```; + ``` + + - **3+N double-quotes** + ```rust + let _ = """ + some code + """; + ``` + + - **3+N single quotes** + ```rust + let _ = ''' + some code + '''; + ``` + + - **Word prefix + N+1 hashes** + ```rust + let _ = code#" + some code + "#; + ``` + + - **Single character prefix + N+1 hashes** + ```rust + let _ = m#" + some code + "#; + ``` + (note: `c` is already reserved for C strings) + +### Indentation rules + + - **Relative to closing quote + retain final newline** + + Benefits: + - Allows every possible indentation to be represented. + - Simple rule. + - The value of the string is obvious and intuitive. + + Drawbacks: + - It is not possible to represent strings without a trailing newline. + + - **Relative to closing quote + remove final newline** + + Benefits: + - Allows every possible indentation to be represented. + - Simple rule. + - Strings without a final newline can be represented. + + Drawbacks: + - There are two ways to represent the empty string. + - It is unintuitive that two empty lines between quotes results in + a single newline. + + - **Relative to first non-empty line** + + Benefits: + - Simple rule. + - The value of the string is obvious and intuitive. + - Strings without a final newline can be represented. + + Drawbacks: + - Some indentations cannot be represented. + + - **Relative to least indented line** + + Benefits: + - Simple rule. + - The value of the string is obvious and intuitive. + - Strings without a final newline can be represented. + + Drawbacks: + - Some indentations cannot be represented. + +### Miscellaneous + + - **Language hint directly following opening quote** + + This is intended to allow extra information (eg. language) to be + conveyed by the programmer to macros and/or their IDE. For example: + ```rust + let _ = ```sql + SELECT * FROM table; + ```; + ``` + Here, an intelligent IDE could apply syntax highlighting to the nested + code block, knowing that the code is SQL. + + - **Annotation on closing quote to remove trailing newline** + + For indentation rules where the final quote must appear on + its own line and there is no way to represent a string without + a trailing newline, a modification character could be used. + + For example: + ```rust + let _ = ``` + no trailing newline + !```; + ``` + This could be used with any quote style and is unambiguous because + nothing can otherwise appear on the same line prior to the closing + quote. + # Prior art [prior-art]: #prior-art -The proposed syntax is primarily based on markdown code block syntax, -which is widely used and should be familiar to most programmers. +The proposed quote style is primarily based on markdown code block syntax, +which is widely used and should be familiar to most programmers. This is +also where the language hint comes from. + +The indentation rules are borrowed from [Perl's "Indented Here-docs"](https://perldoc.perl.org/perlop#EOF) and [PHP's "Heredoc" syntax](https://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc) + +The [`indoc` crate](https://docs.rs/indoc/latest/indoc/) exists to remove +leading indentation from multiline string literals. However, it cannot +help with the reformatting done by `rustfmt`, and is generally not understood +by IDEs. # Unresolved questions @@ -255,7 +395,7 @@ which is widely used and should be familiar to most programmers. ```rust query!(```postgresql - ```) + ```) ``` could parse the query in a PostgreSQL specific way. From 827caa0c358256879af0b82f7e4a137ae6130c9c Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Wed, 21 Jun 2023 23:23:52 +0100 Subject: [PATCH 4/8] Alternative newline annotation... --- text/0000-code-literals.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index a7056fc237d..0a31e4372ef 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -358,6 +358,14 @@ problems with this (as demonstrated in this document). no trailing newline !```; ``` + + Or (the less serious suggestion of)... + ```rust + let _ = ``` + no trailing newline + 🚫```; + ``` + This could be used with any quote style and is unambiguous because nothing can otherwise appear on the same line prior to the closing quote. From ec487c748923390b8b683c7222fbbeea6d7481ff Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Sat, 24 Jun 2023 12:45:15 +0100 Subject: [PATCH 5/8] Switch to single character prefix alternative --- text/0000-code-literals.md | 309 +++++++++++++++++++++++-------------- 1 file changed, 192 insertions(+), 117 deletions(-) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index 0a31e4372ef..be6814c903a 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -10,6 +10,15 @@ Add a new kind of multi-line string literal for embedding code which plays nicely with `rustfmt` and doesn't introduce unwanted whitespace into multi-line string literals. +--- + +**NOTE: The syntax presented here is *one possible syntax* +in a huge space. The purpose of this RFC is to gain consensus that +such a feature would be beneficial to the language, not to settle +every possible bike-shedding decision.** + +--- + # Motivation [motivation]: #motivation @@ -50,7 +59,7 @@ into multi-line string literals. ); ``` - To do otherwise would be to change thange the value of + To do otherwise would be to change the value of the string literal. 2. Normal string literals with backslash escaping, eg. @@ -89,126 +98,163 @@ into multi-line string literals. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -In addition to string literals and raw string literals, a third type -of string literal exists: code string literals. +A modifier `h` (for +[Here document](https://en.wikipedia.org/wiki/Here_document)) +may be added to a string literal prefix to change how the +string is interpreted by the compiler. The effect of the `h` +modifier causes all indentation to be relative to the +closing quote: ```rust - let code = ``` - This is a code string literal + let code = h" + This is a code string literal. - I can use special characters like "" and \ freely. + I can use escape sequences like \n since the `h` + prefix was added to a normal string literal Indentation is preserved *relative* to the indentation level - of the terminating triple backticks. + of the terminating quote. - It is an error for a line to have "negative" indentation (ie. be - indented less than the final triple backticks) unless + It is an error for a line to have negative indentation (ie. be + indented less than the final quote) unless the line is empty. - ```; + "; ``` `rustfmt` will automatically adjust the indentation of the code string literal as a whole to match the surrounding context, but will never change the relative indentation within such a literal. -Anything directly after the opening backticks is not considered -part of the string literal. It may be used as a language hint or -processed by macros (similar to the treatment of doc comments). +The `h` modifier will often be combined with raw string literals to +embed sections of code such as SQL: ```rust -let sql = ```sql - SELECT * FROM table; - ```; + let code = hr#" + This is also a code string literal + + I can use special characters like "" and \ freely. + + Indentation is still *relative* to the indentation level + of the terminating quote. + "#; ``` -Similar to raw string literals, there is no way to escape characters -within a code string literal. It is expected that procedural macros -would build upon code string literals to add support for such -functionality as required. +For completeness, the `h` modifier may also be combined with byte +and raw byte string literals, eg. `hb"` and `hbr#"`. -If it is necessary to include triple backticks within a code string -literal, more than three backticks may be used to enclose the -literal, eg. +Anything directly after the opening quote is not considered +part of the string literal. It may be used as a language hint or +processed by macros (similar to the treatment of doc comments). ```rust -let code = ```` - ``` -````; +let sql = hr#"sql + SELECT * FROM table; + "#; ``` +When the `h` modifier is used with a raw string literal, the same +rules as usual apply, where the number of `#` characters can be +increased if the sequence `"#` needs to appear inside the string. + In order to suppress the final newline, the literal may instead be -closed with `!``` `, eg. +closed with `-" ` or `-"#` depending on the opening quote, eg. ```rust -let code = ``` +let code = hr#" Text with no final newline - !```; + -"#; ``` +Aside from this `-` modifier, only whitespace may appear on the final +line prior to the closing quote. + +Together, these rules ensure that every possible string can be represented +in a single canonical way, while allowing the indentation of the string +as a whole to be changed freely. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -A code string literal will begin and end with three or more backticks. -The number of backticks in the terminator must match the number used -to begin the literal. +An `h` modifier may be added to the prefix of the following string +literal types: -The value of the string literal will be determined using the following -steps: +- String literals `h"` +- Raw string literals `hr#"` +- Byte string literals `hb"` +- Raw byte string literals `hbr#"` -1. Measure the whitespace indenting the closing backticks. If a - non-whitespace character (other than a single `!`) exists before - the closing backticks on the same line, then issue a compiler error. +The `h` modifier will appear before all characters in the prefix. + +The value of a string literal with the `h` modifier will be determined +using the following steps: + +1. Measure the whitespace indenting the closing quote. If a + non-whitespace character (other than a single `-`) exists before + the closing quote on the same line, then issue a compiler error. 2. Take the lines *between* (but not including) the opening and - closing backticks exactly as written. -3. Remove exactly the measured whitespace from each line. If this - cannot be done, then issue a compiler error. -4. If the string was terminated with `!``` `, then remove the - final newline. + closing quotes exactly as written. +3. Remove exactly the measured whitespace from each non-empty line. + If this cannot be done, then issue a compiler error. The + whitespace must match down to the exact character sequence. +4. If a `-` character was present immediately prior to the closing + quote, then remove the final newline. +5. Interpret any escape sequences and apply any pre-processing as + usual for the string literal type without an `h` modifier. + For example, newlines in the file are always treated as `\n` + even if the file is encoded with `\r\n` newlines. Here are some edge case examples: ```rust - // Empty string - assert_eq!(```foo - ```, ""); + // Empty string with language hint + assert_eq!(h"foo + ", ""); // Newline - assert_eq!(``` + assert_eq!(h" - ```, "\n"); + ", "\n"); // No terminating newline - assert_eq!(``` + assert_eq!(h" bar - !```, "bar"); + -", "bar"); // Terminating newline - assert_eq!(``` + assert_eq!(h" bar - ```, "bar\n"); + ", "bar\n"); // Preserved indent - assert_eq!(``` + assert_eq!(hr#" if a: print(42) - ```, "if a:\n print(42)\n"); + "#, "if a:\n print(42)\n"); // Relative indent - assert_eq!(``` + assert_eq!(hr#" if a: print(42) - ```, "if a:\n print(42)\n"); + "#, "if a:\n print(42)\n"); - // Relative to closing backticks - assert_eq!(``` + // Relative to closing quote + assert_eq!(hr#" if a: print(42) - ```, "\n\n if a:\n print(42)\n"); + "#, "\n\n if a:\n print(42)\n"); + + // Interactions with escaping rules + assert_eq!(h" + \"\ + foo\n + bar + \t + ", " \"foo\n\n bar\n\t\n"); ``` -The text between the opening backticks and the first newline is +Any text between the opening quote and the first newline is preserved within the AST, but is otherwise unused. # Drawbacks @@ -216,42 +262,31 @@ preserved within the AST, but is otherwise unused. The main drawback is increased complexity of the language: -1. It adds a new symbol to the language, which was not previously used. -2. It adds a third way of writing string literals. +1. It adds a four new types of string literals given all + the combinations. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -There is lots of room to bike-shed syntax. -If there is significant opposition to the backtick syntax, then an -alternative syntax such as: -``` -code" - string -" -``` -could be used. - -Similarly, the use of more than three backticks may be unpopular. -It's not clear how important it is to be able to nest backticks -within backticks, but a syntax mirroring raw string literals could -be used instead, eg. -``` -`# foo - string -#` -``` - -There is also the question of whether the backtick syntax would -interfere with the ability to paste Rust code snippets into such -blocks. Experimentally, markdown parsers do not seem to have any -problems with this (as demonstrated in this document). +Many possible options regarding syntax have been explored during +the life of this RFC. This section will attempt to categorize +and enumerate every variation considered. The options marked +with a :heavy_check_mark: are the variations which were chosen +to form the syntax proposed above. ## A list of all options regarding syntax ### Quote style - - :heavy_check_mark: **3+N backticks** + - :heavy_check_mark: **Single character prefix + N hashes** + ```rust + let _ = hr#" + some code + "#; + ``` + (note: `c` is already reserved for C strings) + + - **3+N backticks** ```rust let _ = ``` some code @@ -272,21 +307,13 @@ problems with this (as demonstrated in this document). '''; ``` - - **Word prefix + N+1 hashes** + - **Word prefix + N hashes** ```rust let _ = code#" some code "#; ``` - - **Single character prefix + N+1 hashes** - ```rust - let _ = m#" - some code - "#; - ``` - (note: `c` is already reserved for C strings) - ### Indentation rules - :heavy_check_mark: **Relative to closing quote + retain final newline** @@ -297,7 +324,8 @@ problems with this (as demonstrated in this document). - The value of the string is obvious and intuitive. Drawbacks: - - It is not possible to represent strings without a trailing newline. + - Requires an additional syntax to allow representing strings + without a trailing newline. - **Relative to closing quote + remove final newline** @@ -307,9 +335,47 @@ problems with this (as demonstrated in this document). - Strings without a final newline can be represented. Drawbacks: - - There are two ways to represent the empty string. - - It is unintuitive that two empty lines between quotes results in - a single newline. + - There are two ways to represent the empty string. For example: + ```rust + let _ = h" + " + ``` + And + ```rust + let _ = h" + + " + ``` + Would need to both represent the empty string. This is + unintuitive. It also means that *two* empty lines are + necessary to represent a single newline. + + - The common case (where the final newline does not need + to be suppressed) is ugly and wastes vertical space: + ```rust + let _ = h" + some code + + "; + ``` + + - Forgetting to add this ugly blank line at the end is a footgun + when concatenating two strings: + ```rust + let a = h" + if a == 1: + return True + "; + let b = h" + if b == 1: + return False + " + format!("{a}{b}") == h" + if a == 1: + return Trueif b == 1: + return False + " + ``` - **Relative to first non-empty line** @@ -319,7 +385,8 @@ problems with this (as demonstrated in this document). - Strings without a final newline can be represented. Drawbacks: - - Some indentations cannot be represented. + - Some indentations cannot be represented (those + where the first line should be indented). - **Relative to least indented line** @@ -329,7 +396,8 @@ problems with this (as demonstrated in this document). - Strings without a final newline can be represented. Drawbacks: - - Some indentations cannot be represented. + - Some indentations cannot be represented (those + where every line should be indented). ### Modifications @@ -338,12 +406,14 @@ problems with this (as demonstrated in this document). This is intended to allow extra information (eg. language) to be conveyed by the programmer to macros and/or their IDE. For example: ```rust - let _ = ```sql + let _ = h"sql SELECT * FROM table; - ```; + "; ``` Here, an intelligent IDE could apply syntax highlighting to the nested - code block, knowing that the code is SQL. + code block, knowing that the code is SQL. The string is not treated + any differently by the compiler, it's purely there for IDEs and + optionally procedural macros. - :heavy_check_mark: **Annotation on closing quote to remove trailing newline** @@ -354,35 +424,40 @@ problems with this (as demonstrated in this document). For example: ```rust - let _ = ``` + let _ = h" no trailing newline - !```; + -"; ``` Or (the less serious suggestion of)... ```rust - let _ = ``` + let _ = h" no trailing newline - 🚫```; + 🚫"; ``` This could be used with any quote style and is unambiguous because nothing can otherwise appear on the same line prior to the closing quote. + Having the annotation be in the string prefix is also possible + (such as `hn"`) but this is worse because it is non-local (the + only effect is on the last line of the string) it "uses up" a + letter for a possible string prefix, and it makes the string + prefix even longer than it already is. + # Prior art [prior-art]: #prior-art -The proposed quote style is primarily based on markdown code block syntax, -which is widely used and should be familiar to most programmers. This is -also where the language hint comes from. - The indentation rules are borrowed from [Perl's "Indented Here-docs"](https://perldoc.perl.org/perlop#EOF) and [PHP's "Heredoc" syntax](https://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc) The [`indoc` crate](https://docs.rs/indoc/latest/indoc/) exists to remove leading indentation from multiline string literals. However, it cannot -help with the reformatting done by `rustfmt`, and is generally not understood -by IDEs. +help with the reformatting done by `rustfmt`, and is generally not +understood by IDEs. It also cannot distinguish between "real" whitespace +in the final, and whitespace introduced by escape sequences. + +The "language hint" is based on markdown code block syntax. # Unresolved questions @@ -398,13 +473,13 @@ by IDEs. interpolation, escaping, etc. without needing to further complicate the language itself. -- Procedural macros could look at the text following the opening triple +- Procedural macros could look at the text following the opening quotes and use that to influence code generation, eg. ```rust - query!(```postgresql + query!(h"postgresql - ```) + ") ``` could parse the query in a PostgreSQL specific way. From a00a4a9a38bd320bae26d68a3dc0aedd3a808fda Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Fri, 30 Jun 2023 00:01:39 +0100 Subject: [PATCH 6/8] Add another alternative --- text/0000-code-literals.md | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index be6814c903a..5d4def9c11b 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -386,7 +386,8 @@ to form the syntax proposed above. Drawbacks: - Some indentations cannot be represented (those - where the first line should be indented). + where the first line should be indented). At least + not without further extensions. - **Relative to least indented line** @@ -397,7 +398,8 @@ to form the syntax proposed above. Drawbacks: - Some indentations cannot be represented (those - where every line should be indented). + where every line should be indented). At least + not without further extensions. ### Modifications @@ -446,6 +448,21 @@ to form the syntax proposed above. letter for a possible string prefix, and it makes the string prefix even longer than it already is. + - **Explicit indentation markers on the closing quote** + + This modification would be useful for indentation rules which + otherwise would now allow every possible indentation to be + represented: + + ```rust + let _ = h" + This line will retain 4 characters of indentation. + ____"; + ``` + + Note that this would not be needed in the currently proposed + scheme, since it can already represent every indentation level. + # Prior art [prior-art]: #prior-art @@ -459,6 +476,8 @@ in the final, and whitespace introduced by escape sequences. The "language hint" is based on markdown code block syntax. +See also https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/#raw-string-literals . + # Unresolved questions [unresolved-questions]: #unresolved-questions From 90ff817111d4100b5ebee3618b83b2cfb4e9f0c5 Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Sun, 9 Jul 2023 10:59:46 +0100 Subject: [PATCH 7/8] Add note about editions --- text/0000-code-literals.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index 5d4def9c11b..72435c62737 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -257,6 +257,12 @@ Here are some edge case examples: Any text between the opening quote and the first newline is preserved within the AST, but is otherwise unused. +This is a backwards compatible change for editions 2021 onwards, since +edition 2021 reserved prefixes for this kind of feature: +https://doc.rust-lang.org/reference/tokens.html#reserved-prefixes. + +Editions prior to 2021 will not benefit from this feature. + # Drawbacks [drawbacks]: #drawbacks From ceca328b1471f67ffeeabf3979ca25a84436e517 Mon Sep 17 00:00:00 2001 From: Diggory Blake Date: Wed, 12 Jul 2023 19:02:36 +0100 Subject: [PATCH 8/8] Tweaks --- text/0000-code-literals.md | 56 +++++++++++++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 4 deletions(-) diff --git a/text/0000-code-literals.md b/text/0000-code-literals.md index 72435c62737..924173e68e3 100644 --- a/text/0000-code-literals.md +++ b/text/0000-code-literals.md @@ -182,8 +182,13 @@ literal types: - Raw string literals `hr#"` - Byte string literals `hb"` - Raw byte string literals `hbr#"` +- *C string literals `hc"`* +- *Raw C string literals `hcr#"`* The `h` modifier will appear before all characters in the prefix. +This rule exists for consistency with raw byte strings, which must +be written as `br""` and not `rb""`. The choice to +have `h` come first is otherwise arbitrary and was chosen for simplicity. The value of a string literal with the `h` modifier will be determined using the following steps: @@ -197,7 +202,8 @@ using the following steps: If this cannot be done, then issue a compiler error. The whitespace must match down to the exact character sequence. 4. If a `-` character was present immediately prior to the closing - quote, then remove the final newline. + quote, then remove the final newline. If there was no final newline + to remove (because the string was empty) then issue a compiler error. 5. Interpret any escape sequences and apply any pre-processing as usual for the string literal type without an `h` modifier. For example, newlines in the file are always treated as `\n` @@ -255,10 +261,17 @@ Here are some edge case examples: ``` Any text between the opening quote and the first newline is -preserved within the AST, but is otherwise unused. +preserved within the AST, but is otherwise unused. It will be +referred to as a "language hint", although may also be used for other +purposes. -This is a backwards compatible change for editions 2021 onwards, since -edition 2021 reserved prefixes for this kind of feature: +The "language hint" (if present) must not begin with a whitespace +character. It is recommended that editors distinguish the language hint +from the rest of the string in some way, such as by highlighting it in +a different colour. + +Overall this is a backwards compatible change for editions 2021 onwards, +since edition 2021 reserved prefixes for this kind of feature: https://doc.rust-lang.org/reference/tokens.html#reserved-prefixes. Editions prior to 2021 will not benefit from this feature. @@ -423,6 +436,41 @@ to form the syntax proposed above. any differently by the compiler, it's purely there for IDEs and optionally procedural macros. + - **Language hint prior to opening quote** + + Similar to above, but using syntax like the following: + + ```rust + let _ = h_sql" + SELECT * FROM table; + "; + ``` + If combined with a raw string it might look like: + ```rust + let _ = h_sql_r#" + SELECT * FROM table; + "#; + ``` + The choice of `_` as a separator is unsatisfactory, as it is normally + used as a *joining* character. + + - **Language hint via an expression attribute** + + Similar to above, but using syntax like the following: + + ```rust + let _ = #[lang(sql)] h" + SELECT * FROM table; + "; + ``` + + This gets very symbol heavy when combined with raw strings: + ```rust + let _ = #[lang(sql)] hr#" + SELECT * FROM table; + "#; + ``` + - :heavy_check_mark: **Annotation on closing quote to remove trailing newline**