-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PEP 701 #7376
Add support for PEP 701 #7376
Commits on Sep 29, 2023
-
Add support for the new f-string tokens per PEP 701 (#6659)
This PR adds support in the lexer for the newly added f-string tokens as per PEP 701. The following new tokens are added: * `FStringStart`: Token value for the start of an f-string. This includes the `f`/`F`/`fr` prefix and the opening quote(s). * `FStringMiddle`: Token value that includes the portion of text inside the f-string that's not part of the expression part and isn't an opening or closing brace. * `FStringEnd`: Token value for the end of an f-string. This includes the closing quote. Additionally, a new `Exclamation` token is added for conversion (`f"{foo!s}"`) as that's part of an expression. New test cases are added to for various possibilities using snapshot testing. The output has been verified using python/cpython@f2cc00527e. _I've put the number of f-strings for each of the following files after the file name_ ``` lexer/large/dataset.py (1) 1.05 612.6±91.60µs 66.4 MB/sec 1.00 584.7±33.72µs 69.6 MB/sec lexer/numpy/ctypeslib.py (0) 1.01 131.8±3.31µs 126.3 MB/sec 1.00 130.9±5.37µs 127.2 MB/sec lexer/numpy/globals.py (1) 1.02 13.2±0.43µs 222.7 MB/sec 1.00 13.0±0.41µs 226.8 MB/sec lexer/pydantic/types.py (8) 1.13 285.0±11.72µs 89.5 MB/sec 1.00 252.9±10.13µs 100.8 MB/sec lexer/unicode/pypinyin.py (0) 1.03 32.9±1.92µs 127.5 MB/sec 1.00 31.8±1.25µs 132.0 MB/sec ``` It seems that overall the lexer has regressed. I profiled every file mentioned above and I saw one improvement which is done in (098ee5d). But otherwise I don't see anything else. A few notes by isolating the f-string part in the profile: * As we're adding new tokens and functionality to emit them, I expect the lexer to take more time because of more code. * The `lex_fstring_middle_or_end` takes the most amount of time followed by the `current_mut` line when lexing the `:` token. The latter is to check if we're at the start of a format spec or not. * In a f-string heavy file such as https://github.com/python/cpython/blob/main/Lib/test/test_fstring.py [^1] (293), most of the time in `lex_fstring_middle_or_end` is accounted by string allocation for the string literal part of `FStringMiddle` token (https://share.firefox.dev/3ErEa1W) I don't see anything out of ordinary for `pydantic/types` profile (https://share.firefox.dev/45XcLRq) fixes: #7042 [^1]: We could add this in lexer and parser benchmark
Configuration menu - View commit details
-
Copy full SHA for 0ce342b - Browse repository at this point
Copy the full SHA 0ce342bView commit details -
Add support for parsing f-string as per PEP 701 (#7041)
This PR adds support for PEP 701 in the parser to use the new tokens emitted by the lexer to construct the f-string node. Without an official grammar, the f-strings were parsed manually. Now that we've the specification, that is being used in the LALRPOP to parse the f-strings. This file includes the logic for parsing string literals and joining the implicit string concatenation. Now that we don't require parsing f-strings manually a lot of code involving the same is removed. Earlier, there were 2 entry points to this module: * `parse_string`: Used to parse a single string literal * `parse_strings`: Used to parse strings which were implicitly concatenated Now, there are 3 entry points: * `parse_string_literal`: Renamed from `parse_string` * `parse_fstring_middle`: Used to parse a `FStringMiddle` token which is basically a string literal without the quotes * `concatenate_strings`: Renamed from `parse_strings` but now it takes the parsed nodes instead. So, we just need to concatenate them into a single node. > A short primer on `FStringMiddle` token: This includes the portion of text inside the f-string that's not part of the expression and isn't an opening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the `foo `, `.3f` and ` bar` are `FStringMiddle` token content. ***Discussion in the official implementation: python/cpython#102855 (comment) This change in the AST is when unicode strings (prefixed with `u`) and f-strings are used in an implicitly concatenated string value. For example, ```python u"foo" f"{bar}" "baz" " some" ``` Pre Python 3.12, the kind field would be assigned only if the prefix was on the first string. So, taking the above example, both `"foo"` and `"baz some"` (implicit concatenation) would be given the `u` kind: <details><summary>Pre 3.12 AST:</summary> <p> ```python Constant(value='foo', kind='u'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='baz some', kind='u') ``` </p> </details> But, post Python 3.12, only the string with the `u` prefix will be assigned the value: <details><summary>Pre 3.12 AST:</summary> <p> ```python Constant(value='foo', kind='u'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='baz some') ``` </p> </details> Here are some more iterations around the change: 1. `"foo" f"{bar}" u"baz" "no"` <details><summary>Pre 3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno') ``` </p> </details> <details><summary>3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno', kind='u') ``` </p> </details> 2. `"foo" f"{bar}" "baz" u"no"` <details><summary>Pre 3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno') ``` </p> </details> <details><summary>3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno') ``` </p> </details> 3. `u"foo" f"bar {baz} realy" u"bar" "no"` <details><summary>Pre 3.12</summary> <p> ```python Constant(value='foobar ', kind='u'), FormattedValue( value=Name(id='baz', ctx=Load()), conversion=-1), Constant(value=' realybarno', kind='u') ``` </p> </details> <details><summary>3.12</summary> <p> ```python Constant(value='foobar ', kind='u'), FormattedValue( value=Name(id='baz', ctx=Load()), conversion=-1), Constant(value=' realybarno') ``` </p> </details> With the hand written parser, we were able to provide better error messages in case of any errors such as the following but now they all are removed and in those cases an "unexpected token" error will be thrown by lalrpop: * A closing delimiter was not opened properly * An opening delimiter was not closed properly * Empty expression not allowed The "Too many nested expressions in an f-string" was removed and instead we can create a lint rule for that. And, "The f-string expression cannot include the given character" was removed because f-strings now support those characters which are mainly same quotes as the outer ones, escape sequences, comments, etc. 1. Refactor existing test cases to use `parse_suite` instead of `parse_fstrings` (doesn't exists anymore) 2. Additional test cases are added as required Updated the snapshots. The change from `parse_fstrings` to `parse_suite` means that the snapshot would produce the module node instead of just a list of f-string parts. I've manually verified that the parts are still the same along with the node ranges. #7263 (comment) fixes: #7043 fixes: #6835
Configuration menu - View commit details
-
Copy full SHA for 2e9ea6f - Browse repository at this point
Copy the full SHA 2e9ea6fView commit details -
Use narrow type for string parsing patterns (#7211)
This PR adds a new enum type `StringType` which is either a string literal, byte literal or f-string. The motivation behind this is to have a narrow type which is accepted in `concatenate_strings` as that function is only applicable for the mentioned 3 types. This makes the code more readable and easy to reason about. A future improvement (which was prototyped here and removed) is to split the current string literal pattern in LALRPOP definition into two parts: 1. A single string literal or a f-string: This means no checking for bytes / non-bytes and other unnecessary compution 2. Two or more of string/byte/f-string: This will call the `concatenate_strings` function. The reason for removing the second change is because of how ranges work. The range for an individual string/byte is the entire range which includes the quotes as well but if the same string/byte is part of a f-string, then it only includes the range for the content (without the quotes / inner range). The current string parser returns with the former range. To give an example, for `"foo"`, the range of the string would be `0..5`, but for `f"foo"` the range of the string would be `2..5` while the range for the f-string expression would be `0..6`. The ranges are correct but they differ in the context the string constant itself is being used for. Is it part of a f-string or is it a standalone string? `cargo test`
Configuration menu - View commit details
-
Copy full SHA for d9876fc - Browse repository at this point
Copy the full SHA d9876fcView commit details -
Disallow non-parenthesized lambda expr in f-string (#7263)
This PR updates the handling of disallowing non-parenthesized lambda expr in f-strings. Previously, the lexer was used to emit an empty `FStringMiddle` token in certain cases for which there's no pattern in the parser to match. That would then raise an unexpected token error while parsing. This PR adds a new f-string error type `LambdaWithoutParentheses`. In cases where the parser still can't detect the error, it's guaranteed to be caught by the fact that there's no `FStringMiddle` token in the pattern. Add test cases wherever we throw the `LambdaWithoutParentheses` error. As this is the final PR for the parser, I'm putting the parser benchmarks here: ``` group fstring-parser main ----- -------------- ---- parser/large/dataset.py 1.00 4.7±0.24ms 8.7 MB/sec 1.03 4.8±0.25ms 8.4 MB/sec parser/numpy/ctypeslib.py 1.03 921.8±39.00µs 18.1 MB/sec 1.00 897.6±39.03µs 18.6 MB/sec parser/numpy/globals.py 1.01 90.4±5.23µs 32.6 MB/sec 1.00 89.6±6.24µs 32.9 MB/sec parser/pydantic/types.py 1.00 1899.5±94.78µs 13.4 MB/sec 1.03 1954.4±105.88µs 13.0 MB/sec parser/unicode/pypinyin.py 1.03 292.3±21.14µs 14.4 MB/sec 1.00 283.2±13.16µs 14.8 MB/sec ```
Configuration menu - View commit details
-
Copy full SHA for e249840 - Browse repository at this point
Copy the full SHA e249840View commit details -
Fix curly brace escape handling in f-strings (#7331)
## Summary This PR fixes the escape handling of curly braces inside a f-string. There are 2 main changes: ### Lexer The lexer change was actually a bug. Instead of breaking as soon as we find a curly brace after the `\` character, we'll continue and let the next iteration handle it in the curly brace branch. This fixes the following case: ```python f"\{{foo}}" # ^ use the curly brace branch to handle this character instead of breaking ``` ### Parser We can encounter a `\` as the last character in a `FStringMiddle` token which is valid in this context[^1]. For example, ```python f"\{foo} \{bar:\}" # ^ ^^ ^ # The marked characters are part of 3 different `FStringMiddle` token ``` Here, the `FStringMiddle` token content will be `"\"` and `" \"` which is invalid in a regular string literal. However, it's valid here because it's a substring of a f-string. Even though curly braces cannot be escaped, it's a valid syntax. [^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas ## Test Plan Verified that existing test cases are passing and add new test cases for the lexer and parser.
Configuration menu - View commit details
-
Copy full SHA for fc28174 - Browse repository at this point
Copy the full SHA fc28174View commit details -
Allow
NUL
character in f-strings (#7378)## Summary This PR fixes the bug to allow `NUL` (`\0`) character inside f-strings. ## Test Plan Add test case with `NUL` character inside f-string.
Configuration menu - View commit details
-
Copy full SHA for 3f81cb1 - Browse repository at this point
Copy the full SHA 3f81cb1View commit details -
Update
Stylist
quote detection with new f-string token (#7328)## Summary This PR updates `Stylist` quote detection to include the f-string tokens. As f-strings cannot be used as docstrings, we'll skip the check for triple-quoted f-strings. ## Test Plan Add new test cases with f-strings. fixes: #7293
Configuration menu - View commit details
-
Copy full SHA for deee2df - Browse repository at this point
Copy the full SHA deee2dfView commit details -
Update
PLE2510
,PLE2512-2515
to check in f-strings (#7387)This PR updates `PLE2510`, `PLE2512-2515` to check in f-strings. > ### Reference: > * `PLE2510`: Invalid unescaped character backspace, use "\b" instead > * `PLE2512`: Invalid unescaped character SUB, use "\x1A" instead > * `PLE2513`: Invalid unescaped character ESC, use "\x1B" instead > * `PLE2514`: Invalid unescaped character NUL, use "\0" instead > * `PLE2515`: Invalid unescaped character zero-width-space, use "\u200B" instead Add test cases for f-strings.
Configuration menu - View commit details
-
Copy full SHA for 37b8a93 - Browse repository at this point
Copy the full SHA 37b8a93View commit details -
Update
F541
to use new f-string tokens (#7327)## Summary This PR updates the `F541` rule to use the new f-string tokens. ## Test Plan Add new test case and uncomment a broken test case. fixes: #7292
Configuration menu - View commit details
-
Copy full SHA for 049c8d5 - Browse repository at this point
Copy the full SHA 049c8d5View commit details -
Update
Indexer
to use new f-string tokens (#7325)## Summary This PR updates the `Indexer` to use the new f-string tokens to compute the `f_string_ranges` for f-strings. It adds a new abstraction which exposes two methods to support extracting the range for the surrounding innermost and outermost f-string. It uses the builder pattern to build the f-string ranges which is similar to how the comment ranges are built. ## Test Plan Add new test cases for f-strings for: * Tab indentation rule * Line continuation detection in the indexer * To get the innermost / outermost f-string range * All detected f-string ranges fixes: #7290
Configuration menu - View commit details
-
Copy full SHA for 57fac75 - Browse repository at this point
Copy the full SHA 57fac75View commit details -
Update
RUF001
,RUF003
to check in f-strings (#7477)## Summary This PR updates the rule `RUF001` and `RUF003` to check in f-strings using the `FStringMiddle` token which contains the non-expression part of a f-string. For reference, | Code | Name | Message| | --- | --- | --- | | RUF001 | ambiguous-unicode-character-string | String contains ambiguous {}. Did you mean {}? | | RUF003 | ambiguous-unicode-character-comment | Comment contains ambiguous {}. Did you mean {}? | ## Test Plan `cargo test`
Configuration menu - View commit details
-
Copy full SHA for 21ca907 - Browse repository at this point
Copy the full SHA 21ca907View commit details -
Update
W605
to check in f-strings (#7329)This PR updates the `W605` (invalid-escape-sequence) to check inside f-strings. It also adds support to report violation on invalid escape sequence within f-strings w.r.t. the curly braces. So, the following cases will be identified: ```python f"\{1}" f"\{{1}}" f"{1:\}" ``` The new CPython parser also gives out a syntax warning for such cases: ``` fstring.py:1: SyntaxWarning: invalid escape sequence '\{' f"\{1}" fstring.py:2: SyntaxWarning: invalid escape sequence '\{' f"\{{1}}" fstring.py:3: SyntaxWarning: invalid escape sequence '\}' f"{1:\}" ``` Nested f-strings are supported here, so the generated fix is aware of that and will create an edit for the proper f-string. Add new test cases for f-strings. fixes: #7295
Configuration menu - View commit details
-
Copy full SHA for 19dc285 - Browse repository at this point
Copy the full SHA 19dc285View commit details -
Update
ISC001
,ISC002
to check in f-strings (#7515)## Summary This PR updates the implicit string concatenation rules, specifically `ISC001` and `ISC002` to account for the new f-string tokens. `ISC003` checks for explicit string concatenation and is not affected by PEP 701 because it is based on AST. ### Implementation The implementation is based on the boundary tokens of the f-string which are `FStringStart` and `FStringEnd`. There are 4 cases to look for: 1. `String` followed by `FStringStart` 2. `FStringEnd` followed by `String` 3. `FStringEnd` followed by `FStringStart` 4. `String` followed by `String` For f-string tokens, we use the `Indexer` to get the entire range of the f-string. This is the range of the innermost f-string. ## Test Plan Add new test cases for nested f-strings.
Configuration menu - View commit details
-
Copy full SHA for 363aeb9 - Browse repository at this point
Copy the full SHA 363aeb9View commit details -
Detect
noqa
directives for multi-line f-strings (#7326)## Summary This PR updates the NoQA directive detection to consider the new f-string tokens. The reason being that now there can be multi-line f-strings without triple-quotes: ```python f"{ x * y }" ``` Here, the `noqa` directive should go at the end of the last line. ## Test Plan * Add new test cases for f-strings * Tested with `--add-noqa` using the following command with the above code snippet: ```console $ cargo run --bin ruff -- check --select=F821 --no-cache --isolated ~/playground/ruff/fstring.py --add-noqa Added 1 noqa directive. ``` Output: ```python f"{ x * y }" # noqa: F821 ``` Running the same command again doesn't add `noqa` directive and without the `--add-noqa` flag, the violation isn't reported. fixes: #7291
Configuration menu - View commit details
-
Copy full SHA for 3aceb6b - Browse repository at this point
Copy the full SHA 3aceb6bView commit details -
Use the new f-string tokens in string formatting (#7586)
## Summary This PR updates the string formatter to account for the new f-string tokens. The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved. For f-strings, it creates some complexity now that it isn't represented as a `String` token. A single f-string will atleast emit 3 tokens (`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new `FStringRangeBuilder` which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level. Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes: ```python f"hello world { x + f\"nested {y}\" }" ``` ## Test plan ``` cargo test --package ruff_python_formatter ```
Configuration menu - View commit details
-
Copy full SHA for 97a5e35 - Browse repository at this point
Copy the full SHA 97a5e35View commit details -
Ignore quote escapes in expression part of f-string (#7597)
This PR fixes the following issues w.r.t. the PEP 701 changes: 1. Mark all unformatted comments inside f-strings as formatted only _after_ the f-string has been formatted. 2. Do not escape or remove the quote escape when normalizing the expression part of a f-string. This PR also updates the `--files-with-errors` number to be 1 less. This is because we can now parse the [`test_fstring.py`](https://discord.com/channels/1039017663004942429/1082324263199064206/1154633274887516254) file in the CPython repository which contains the new f-string syntax. This is also the file which updates the similarity index for CPython compared to main. `cargo test -p ruff_python_formatter` | project | similarity index | total files | changed files | |--------------|------------------:|------------------:|------------------:| | cpython | 0.76051 | 1789 | 1632 | | django | 0.99983 | 2760 | 36 | | transformers | 0.99963 | 2587 | 323 | | twine | 1.00000 | 33 | 0 | | typeshed | 0.99979 | 3496 | 22 | | warehouse | 0.99967 | 648 | 15 | | zulip | 0.99972 | 1437 | 21 | | project | similarity index | total files | changed files | |--------------|------------------:|------------------:|------------------:| | cpython | 0.76083 | 1789 | 1631 | | django | 0.99983 | 2760 | 36 | | transformers | 0.99963 | 2587 | 323 | | twine | 1.00000 | 33 | 0 | | typeshed | 0.99979 | 3496 | 22 | | warehouse | 0.99967 | 648 | 15 | | zulip | 0.99972 | 1437 | 21 |
Configuration menu - View commit details
-
Copy full SHA for 01123d5 - Browse repository at this point
Copy the full SHA 01123d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for e9a2595 - Browse repository at this point
Copy the full SHA e9a2595View commit details -
Separate
Q003
to accomodate f-string context (#7588)This PR updates the `Q003` rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version. The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if: 1. Outer quote matches the user preferred quote 2. Not a raw string 3. Not a triple-quoted string 4. String content contains the same quote as the outer one 5. String content _doesn't_ contain the opposite quote For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (`FStringMiddle`) where the escapes exists. It contains the following: 1. Do we want to check for escaped quotes in the current f-string? This is required to: * Preserve the context for `FStringMiddle` tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with the `FStringStart` token which contains the quotes. So, when the context starts, we'll store this information. * Disallow nesting for pre 3.12 target versions 2. Store the `FStringStart` token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s). 3. All the `FStringMiddle` ranges where there are escaped quote(s). * Add new test cases for nested f-strings. * Write new tests for old Python versions as existing ones test it on the latest version by default which is 3.12 as of this writing. * Verify the snapshots
Configuration menu - View commit details
-
Copy full SHA for 2ba96ee - Browse repository at this point
Copy the full SHA 2ba96eeView commit details -
Update
Q000
,Q001
with the new f-string tokens (#7589)## Summary This PR updates the `Q000`, and `Q001` rules to consider the new f-string tokens. The docstring rule (`Q002`) doesn't need to be updated because f-strings cannot be used as docstrings. I tried implementing the nested f-string support but there are still some edge cases in my current implementation so I've decided to pause it for now and I'll pick it up sometime soon. So, for now this doesn't support nested f-strings. ### Implementation The implementation uses the same `FStringRangeBuilder` introduced in #7586 to build up the outermost f-string range. The reason to use the same implementation is because this is a temporary solution until we add support for nested f-strings. ## Test Plan `cargo test`
Configuration menu - View commit details
-
Copy full SHA for 2d8270f - Browse repository at this point
Copy the full SHA 2d8270fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0563dcb - Browse repository at this point
Copy the full SHA 0563dcbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6114f61 - Browse repository at this point
Copy the full SHA 6114f61View commit details -
Configuration menu - View commit details
-
Copy full SHA for 57501e2 - Browse repository at this point
Copy the full SHA 57501e2View commit details -
Add support for the new f-string tokens per PEP 701 (#6659)
This PR adds support in the lexer for the newly added f-string tokens as per PEP 701. The following new tokens are added: * `FStringStart`: Token value for the start of an f-string. This includes the `f`/`F`/`fr` prefix and the opening quote(s). * `FStringMiddle`: Token value that includes the portion of text inside the f-string that's not part of the expression part and isn't an opening or closing brace. * `FStringEnd`: Token value for the end of an f-string. This includes the closing quote. Additionally, a new `Exclamation` token is added for conversion (`f"{foo!s}"`) as that's part of an expression. New test cases are added to for various possibilities using snapshot testing. The output has been verified using python/cpython@f2cc00527e. _I've put the number of f-strings for each of the following files after the file name_ ``` lexer/large/dataset.py (1) 1.05 612.6±91.60µs 66.4 MB/sec 1.00 584.7±33.72µs 69.6 MB/sec lexer/numpy/ctypeslib.py (0) 1.01 131.8±3.31µs 126.3 MB/sec 1.00 130.9±5.37µs 127.2 MB/sec lexer/numpy/globals.py (1) 1.02 13.2±0.43µs 222.7 MB/sec 1.00 13.0±0.41µs 226.8 MB/sec lexer/pydantic/types.py (8) 1.13 285.0±11.72µs 89.5 MB/sec 1.00 252.9±10.13µs 100.8 MB/sec lexer/unicode/pypinyin.py (0) 1.03 32.9±1.92µs 127.5 MB/sec 1.00 31.8±1.25µs 132.0 MB/sec ``` It seems that overall the lexer has regressed. I profiled every file mentioned above and I saw one improvement which is done in (098ee5d). But otherwise I don't see anything else. A few notes by isolating the f-string part in the profile: * As we're adding new tokens and functionality to emit them, I expect the lexer to take more time because of more code. * The `lex_fstring_middle_or_end` takes the most amount of time followed by the `current_mut` line when lexing the `:` token. The latter is to check if we're at the start of a format spec or not. * In a f-string heavy file such as https://github.com/python/cpython/blob/main/Lib/test/test_fstring.py [^1] (293), most of the time in `lex_fstring_middle_or_end` is accounted by string allocation for the string literal part of `FStringMiddle` token (https://share.firefox.dev/3ErEa1W) I don't see anything out of ordinary for `pydantic/types` profile (https://share.firefox.dev/45XcLRq) fixes: #7042 [^1]: We could add this in lexer and parser benchmark
Configuration menu - View commit details
-
Copy full SHA for 3839819 - Browse repository at this point
Copy the full SHA 3839819View commit details -
Add support for parsing f-string as per PEP 701 (#7041)
This PR adds support for PEP 701 in the parser to use the new tokens emitted by the lexer to construct the f-string node. Without an official grammar, the f-strings were parsed manually. Now that we've the specification, that is being used in the LALRPOP to parse the f-strings. This file includes the logic for parsing string literals and joining the implicit string concatenation. Now that we don't require parsing f-strings manually a lot of code involving the same is removed. Earlier, there were 2 entry points to this module: * `parse_string`: Used to parse a single string literal * `parse_strings`: Used to parse strings which were implicitly concatenated Now, there are 3 entry points: * `parse_string_literal`: Renamed from `parse_string` * `parse_fstring_middle`: Used to parse a `FStringMiddle` token which is basically a string literal without the quotes * `concatenate_strings`: Renamed from `parse_strings` but now it takes the parsed nodes instead. So, we just need to concatenate them into a single node. > A short primer on `FStringMiddle` token: This includes the portion of text inside the f-string that's not part of the expression and isn't an opening or closing brace. For example, in `f"foo {bar:.3f{x}} bar"`, the `foo `, `.3f` and ` bar` are `FStringMiddle` token content. ***Discussion in the official implementation: python/cpython#102855 (comment) This change in the AST is when unicode strings (prefixed with `u`) and f-strings are used in an implicitly concatenated string value. For example, ```python u"foo" f"{bar}" "baz" " some" ``` Pre Python 3.12, the kind field would be assigned only if the prefix was on the first string. So, taking the above example, both `"foo"` and `"baz some"` (implicit concatenation) would be given the `u` kind: <details><summary>Pre 3.12 AST:</summary> <p> ```python Constant(value='foo', kind='u'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='baz some', kind='u') ``` </p> </details> But, post Python 3.12, only the string with the `u` prefix will be assigned the value: <details><summary>Pre 3.12 AST:</summary> <p> ```python Constant(value='foo', kind='u'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='baz some') ``` </p> </details> Here are some more iterations around the change: 1. `"foo" f"{bar}" u"baz" "no"` <details><summary>Pre 3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno') ``` </p> </details> <details><summary>3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno', kind='u') ``` </p> </details> 2. `"foo" f"{bar}" "baz" u"no"` <details><summary>Pre 3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno') ``` </p> </details> <details><summary>3.12</summary> <p> ```python Constant(value='foo'), FormattedValue( value=Name(id='bar', ctx=Load()), conversion=-1), Constant(value='bazno') ``` </p> </details> 3. `u"foo" f"bar {baz} realy" u"bar" "no"` <details><summary>Pre 3.12</summary> <p> ```python Constant(value='foobar ', kind='u'), FormattedValue( value=Name(id='baz', ctx=Load()), conversion=-1), Constant(value=' realybarno', kind='u') ``` </p> </details> <details><summary>3.12</summary> <p> ```python Constant(value='foobar ', kind='u'), FormattedValue( value=Name(id='baz', ctx=Load()), conversion=-1), Constant(value=' realybarno') ``` </p> </details> With the hand written parser, we were able to provide better error messages in case of any errors such as the following but now they all are removed and in those cases an "unexpected token" error will be thrown by lalrpop: * A closing delimiter was not opened properly * An opening delimiter was not closed properly * Empty expression not allowed The "Too many nested expressions in an f-string" was removed and instead we can create a lint rule for that. And, "The f-string expression cannot include the given character" was removed because f-strings now support those characters which are mainly same quotes as the outer ones, escape sequences, comments, etc. 1. Refactor existing test cases to use `parse_suite` instead of `parse_fstrings` (doesn't exists anymore) 2. Additional test cases are added as required Updated the snapshots. The change from `parse_fstrings` to `parse_suite` means that the snapshot would produce the module node instead of just a list of f-string parts. I've manually verified that the parts are still the same along with the node ranges. #7263 (comment) fixes: #7043 fixes: #6835
Configuration menu - View commit details
-
Copy full SHA for 94b0b52 - Browse repository at this point
Copy the full SHA 94b0b52View commit details -
Use narrow type for string parsing patterns (#7211)
This PR adds a new enum type `StringType` which is either a string literal, byte literal or f-string. The motivation behind this is to have a narrow type which is accepted in `concatenate_strings` as that function is only applicable for the mentioned 3 types. This makes the code more readable and easy to reason about. A future improvement (which was prototyped here and removed) is to split the current string literal pattern in LALRPOP definition into two parts: 1. A single string literal or a f-string: This means no checking for bytes / non-bytes and other unnecessary compution 2. Two or more of string/byte/f-string: This will call the `concatenate_strings` function. The reason for removing the second change is because of how ranges work. The range for an individual string/byte is the entire range which includes the quotes as well but if the same string/byte is part of a f-string, then it only includes the range for the content (without the quotes / inner range). The current string parser returns with the former range. To give an example, for `"foo"`, the range of the string would be `0..5`, but for `f"foo"` the range of the string would be `2..5` while the range for the f-string expression would be `0..6`. The ranges are correct but they differ in the context the string constant itself is being used for. Is it part of a f-string or is it a standalone string? `cargo test`
Configuration menu - View commit details
-
Copy full SHA for 4521198 - Browse repository at this point
Copy the full SHA 4521198View commit details -
Disallow non-parenthesized lambda expr in f-string (#7263)
This PR updates the handling of disallowing non-parenthesized lambda expr in f-strings. Previously, the lexer was used to emit an empty `FStringMiddle` token in certain cases for which there's no pattern in the parser to match. That would then raise an unexpected token error while parsing. This PR adds a new f-string error type `LambdaWithoutParentheses`. In cases where the parser still can't detect the error, it's guaranteed to be caught by the fact that there's no `FStringMiddle` token in the pattern. Add test cases wherever we throw the `LambdaWithoutParentheses` error. As this is the final PR for the parser, I'm putting the parser benchmarks here: ``` group fstring-parser main ----- -------------- ---- parser/large/dataset.py 1.00 4.7±0.24ms 8.7 MB/sec 1.03 4.8±0.25ms 8.4 MB/sec parser/numpy/ctypeslib.py 1.03 921.8±39.00µs 18.1 MB/sec 1.00 897.6±39.03µs 18.6 MB/sec parser/numpy/globals.py 1.01 90.4±5.23µs 32.6 MB/sec 1.00 89.6±6.24µs 32.9 MB/sec parser/pydantic/types.py 1.00 1899.5±94.78µs 13.4 MB/sec 1.03 1954.4±105.88µs 13.0 MB/sec parser/unicode/pypinyin.py 1.03 292.3±21.14µs 14.4 MB/sec 1.00 283.2±13.16µs 14.8 MB/sec ```
Configuration menu - View commit details
-
Copy full SHA for 3cc5455 - Browse repository at this point
Copy the full SHA 3cc5455View commit details -
Fix curly brace escape handling in f-strings (#7331)
## Summary This PR fixes the escape handling of curly braces inside a f-string. There are 2 main changes: ### Lexer The lexer change was actually a bug. Instead of breaking as soon as we find a curly brace after the `\` character, we'll continue and let the next iteration handle it in the curly brace branch. This fixes the following case: ```python f"\{{foo}}" # ^ use the curly brace branch to handle this character instead of breaking ``` ### Parser We can encounter a `\` as the last character in a `FStringMiddle` token which is valid in this context[^1]. For example, ```python f"\{foo} \{bar:\}" # ^ ^^ ^ # The marked characters are part of 3 different `FStringMiddle` token ``` Here, the `FStringMiddle` token content will be `"\"` and `" \"` which is invalid in a regular string literal. However, it's valid here because it's a substring of a f-string. Even though curly braces cannot be escaped, it's a valid syntax. [^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas ## Test Plan Verified that existing test cases are passing and add new test cases for the lexer and parser.
Configuration menu - View commit details
-
Copy full SHA for 63b03b7 - Browse repository at this point
Copy the full SHA 63b03b7View commit details -
Allow
NUL
character in f-strings (#7378)## Summary This PR fixes the bug to allow `NUL` (`\0`) character inside f-strings. ## Test Plan Add test case with `NUL` character inside f-string.
Configuration menu - View commit details
-
Copy full SHA for 846948f - Browse repository at this point
Copy the full SHA 846948fView commit details -
Update
Stylist
quote detection with new f-string token (#7328)## Summary This PR updates `Stylist` quote detection to include the f-string tokens. As f-strings cannot be used as docstrings, we'll skip the check for triple-quoted f-strings. ## Test Plan Add new test cases with f-strings. fixes: #7293
Configuration menu - View commit details
-
Copy full SHA for 658435e - Browse repository at this point
Copy the full SHA 658435eView commit details -
Update
PLE2510
,PLE2512-2515
to check in f-strings (#7387)This PR updates `PLE2510`, `PLE2512-2515` to check in f-strings. > ### Reference: > * `PLE2510`: Invalid unescaped character backspace, use "\b" instead > * `PLE2512`: Invalid unescaped character SUB, use "\x1A" instead > * `PLE2513`: Invalid unescaped character ESC, use "\x1B" instead > * `PLE2514`: Invalid unescaped character NUL, use "\0" instead > * `PLE2515`: Invalid unescaped character zero-width-space, use "\u200B" instead Add test cases for f-strings.
Configuration menu - View commit details
-
Copy full SHA for 402ac49 - Browse repository at this point
Copy the full SHA 402ac49View commit details -
Update
F541
to use new f-string tokens (#7327)## Summary This PR updates the `F541` rule to use the new f-string tokens. ## Test Plan Add new test case and uncomment a broken test case. fixes: #7292
Configuration menu - View commit details
-
Copy full SHA for 4481558 - Browse repository at this point
Copy the full SHA 4481558View commit details -
Update
Indexer
to use new f-string tokens (#7325)## Summary This PR updates the `Indexer` to use the new f-string tokens to compute the `f_string_ranges` for f-strings. It adds a new abstraction which exposes two methods to support extracting the range for the surrounding innermost and outermost f-string. It uses the builder pattern to build the f-string ranges which is similar to how the comment ranges are built. ## Test Plan Add new test cases for f-strings for: * Tab indentation rule * Line continuation detection in the indexer * To get the innermost / outermost f-string range * All detected f-string ranges fixes: #7290
Configuration menu - View commit details
-
Copy full SHA for 124cd4a - Browse repository at this point
Copy the full SHA 124cd4aView commit details -
Update
RUF001
,RUF003
to check in f-strings (#7477)## Summary This PR updates the rule `RUF001` and `RUF003` to check in f-strings using the `FStringMiddle` token which contains the non-expression part of a f-string. For reference, | Code | Name | Message| | --- | --- | --- | | RUF001 | ambiguous-unicode-character-string | String contains ambiguous {}. Did you mean {}? | | RUF003 | ambiguous-unicode-character-comment | Comment contains ambiguous {}. Did you mean {}? | ## Test Plan `cargo test`
Configuration menu - View commit details
-
Copy full SHA for 49ea2e5 - Browse repository at this point
Copy the full SHA 49ea2e5View commit details -
Update
W605
to check in f-strings (#7329)This PR updates the `W605` (invalid-escape-sequence) to check inside f-strings. It also adds support to report violation on invalid escape sequence within f-strings w.r.t. the curly braces. So, the following cases will be identified: ```python f"\{1}" f"\{{1}}" f"{1:\}" ``` The new CPython parser also gives out a syntax warning for such cases: ``` fstring.py:1: SyntaxWarning: invalid escape sequence '\{' f"\{1}" fstring.py:2: SyntaxWarning: invalid escape sequence '\{' f"\{{1}}" fstring.py:3: SyntaxWarning: invalid escape sequence '\}' f"{1:\}" ``` Nested f-strings are supported here, so the generated fix is aware of that and will create an edit for the proper f-string. Add new test cases for f-strings. fixes: #7295
Configuration menu - View commit details
-
Copy full SHA for 26d5daf - Browse repository at this point
Copy the full SHA 26d5dafView commit details -
Update
ISC001
,ISC002
to check in f-strings (#7515)## Summary This PR updates the implicit string concatenation rules, specifically `ISC001` and `ISC002` to account for the new f-string tokens. `ISC003` checks for explicit string concatenation and is not affected by PEP 701 because it is based on AST. ### Implementation The implementation is based on the boundary tokens of the f-string which are `FStringStart` and `FStringEnd`. There are 4 cases to look for: 1. `String` followed by `FStringStart` 2. `FStringEnd` followed by `String` 3. `FStringEnd` followed by `FStringStart` 4. `String` followed by `String` For f-string tokens, we use the `Indexer` to get the entire range of the f-string. This is the range of the innermost f-string. ## Test Plan Add new test cases for nested f-strings.
Configuration menu - View commit details
-
Copy full SHA for fac0974 - Browse repository at this point
Copy the full SHA fac0974View commit details -
Detect
noqa
directives for multi-line f-strings (#7326)## Summary This PR updates the NoQA directive detection to consider the new f-string tokens. The reason being that now there can be multi-line f-strings without triple-quotes: ```python f"{ x * y }" ``` Here, the `noqa` directive should go at the end of the last line. ## Test Plan * Add new test cases for f-strings * Tested with `--add-noqa` using the following command with the above code snippet: ```console $ cargo run --bin ruff -- check --select=F821 --no-cache --isolated ~/playground/ruff/fstring.py --add-noqa Added 1 noqa directive. ``` Output: ```python f"{ x * y }" # noqa: F821 ``` Running the same command again doesn't add `noqa` directive and without the `--add-noqa` flag, the violation isn't reported. fixes: #7291
Configuration menu - View commit details
-
Copy full SHA for 5077513 - Browse repository at this point
Copy the full SHA 5077513View commit details -
Use the new f-string tokens in string formatting (#7586)
## Summary This PR updates the string formatter to account for the new f-string tokens. The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved. For f-strings, it creates some complexity now that it isn't represented as a `String` token. A single f-string will atleast emit 3 tokens (`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new `FStringRangeBuilder` which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level. Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes: ```python f"hello world { x + f\"nested {y}\" }" ``` ## Test plan ``` cargo test --package ruff_python_formatter ```
Configuration menu - View commit details
-
Copy full SHA for e032429 - Browse repository at this point
Copy the full SHA e032429View commit details -
Ignore quote escapes in expression part of f-string (#7597)
This PR fixes the following issues w.r.t. the PEP 701 changes: 1. Mark all unformatted comments inside f-strings as formatted only _after_ the f-string has been formatted. 2. Do not escape or remove the quote escape when normalizing the expression part of a f-string. This PR also updates the `--files-with-errors` number to be 1 less. This is because we can now parse the [`test_fstring.py`](https://discord.com/channels/1039017663004942429/1082324263199064206/1154633274887516254) file in the CPython repository which contains the new f-string syntax. This is also the file which updates the similarity index for CPython compared to main. `cargo test -p ruff_python_formatter` | project | similarity index | total files | changed files | |--------------|------------------:|------------------:|------------------:| | cpython | 0.76051 | 1789 | 1632 | | django | 0.99983 | 2760 | 36 | | transformers | 0.99963 | 2587 | 323 | | twine | 1.00000 | 33 | 0 | | typeshed | 0.99979 | 3496 | 22 | | warehouse | 0.99967 | 648 | 15 | | zulip | 0.99972 | 1437 | 21 | | project | similarity index | total files | changed files | |--------------|------------------:|------------------:|------------------:| | cpython | 0.76083 | 1789 | 1631 | | django | 0.99983 | 2760 | 36 | | transformers | 0.99963 | 2587 | 323 | | twine | 1.00000 | 33 | 0 | | typeshed | 0.99979 | 3496 | 22 | | warehouse | 0.99967 | 648 | 15 | | zulip | 0.99972 | 1437 | 21 |
Configuration menu - View commit details
-
Copy full SHA for e4e2b45 - Browse repository at this point
Copy the full SHA e4e2b45View commit details -
Configuration menu - View commit details
-
Copy full SHA for c61d134 - Browse repository at this point
Copy the full SHA c61d134View commit details -
Separate
Q003
to accomodate f-string context (#7588)This PR updates the `Q003` rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version. The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if: 1. Outer quote matches the user preferred quote 2. Not a raw string 3. Not a triple-quoted string 4. String content contains the same quote as the outer one 5. String content _doesn't_ contain the opposite quote For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (`FStringMiddle`) where the escapes exists. It contains the following: 1. Do we want to check for escaped quotes in the current f-string? This is required to: * Preserve the context for `FStringMiddle` tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with the `FStringStart` token which contains the quotes. So, when the context starts, we'll store this information. * Disallow nesting for pre 3.12 target versions 2. Store the `FStringStart` token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s). 3. All the `FStringMiddle` ranges where there are escaped quote(s). * Add new test cases for nested f-strings. * Write new tests for old Python versions as existing ones test it on the latest version by default which is 3.12 as of this writing. * Verify the snapshots
Configuration menu - View commit details
-
Copy full SHA for 48d4f23 - Browse repository at this point
Copy the full SHA 48d4f23View commit details -
Update
Q000
,Q001
with the new f-string tokens (#7589)## Summary This PR updates the `Q000`, and `Q001` rules to consider the new f-string tokens. The docstring rule (`Q002`) doesn't need to be updated because f-strings cannot be used as docstrings. I tried implementing the nested f-string support but there are still some edge cases in my current implementation so I've decided to pause it for now and I'll pick it up sometime soon. So, for now this doesn't support nested f-strings. ### Implementation The implementation uses the same `FStringRangeBuilder` introduced in #7586 to build up the outermost f-string range. The reason to use the same implementation is because this is a temporary solution until we add support for nested f-strings. ## Test Plan `cargo test`
Configuration menu - View commit details
-
Copy full SHA for a04d4c1 - Browse repository at this point
Copy the full SHA a04d4c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0b676cb - Browse repository at this point
Copy the full SHA 0b676cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 68e7018 - Browse repository at this point
Copy the full SHA 68e7018View commit details -
Configuration menu - View commit details
-
Copy full SHA for b8b5131 - Browse repository at this point
Copy the full SHA b8b5131View commit details -
Configuration menu - View commit details
-
Copy full SHA for 204a62c - Browse repository at this point
Copy the full SHA 204a62cView commit details