Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix formatter instability for lines only consisting of zero-width characters #11748

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jun 5, 2024

Summary

Fixes #11724 (comment)

Code from the linked issue that produces instable formatting:

'''


'''

This turned out to be something entirely different than I expected.
I initially thought that it was an issue with the docstring rendering, which uses trim instead of PythonWhitespace::trim.
But I learned the better when I compared the generated IR for the above snippet and the same snippet but with \u{15} (the non visible character in the snippet above) with a.
Both snippets generate the same IR! That means this is a bug in the Printer and not in the Python formatter.

It turns out that the fix is relatively simple.

The Printer omits hard_line_break IR elements or only renders a single \n for a empty_line IR elements when the
current line is all empty. The problem is that the Printer uses the line width to determine if the line is empty.
However, using the line width here isn't correct because a line can be non-empty but has a zero width if it only consists of zero-width characters, which is exactly what we have in the reported instability.

I considered removing this behavior but we rely heavily on it in suite.rs and I'm not going to touch this to fix this bug!
So what I did instead is to change the Printer to keep track of the start of the current line and test if the current formatted line is empty.

I don't suspect that this changes performance much because it only adds a single write after a newline character. Testing whether the line gets a bit more expensive
because it now needs to dereference into the buffer, but I think in total, this is just noise.

And I was wrong lol. THis has a huge perf impact. Let's see if I can do better

Backwards compatibility

It's hard to be a 100% sure, but I'm confident that it doesn't have widespread impact because zero-width characters are rare, even more so zero-width lines only consisting of zero-width characters.

The only zero-width character that Python allows outside strings and comments is the form-feed character. Ruff does not preserve line-feed characters except in suppressed ranges. The existing implementation used a work-around to avoid this very specific printer behavior. This PR allows to remove the work around and the tests keep passing.

For strings, we write the entire string content as a single text, preserving line breaks as is.

For comments: That's the issue discovered here. Existing code would already have the trailing line stripped and reformatting the same code wouldn't insert the trailing line after the zero-width characters line.

So we should be good

Test Plan

Added test. All existing tests keep passing.

Thanks

Thanks to @jasikpark for finding the instability and helping me to reproduce the issue.

@MichaReiser MichaReiser added bug Something isn't working formatter Related to the formatter labels Jun 5, 2024
Copy link

codspeed-hq bot commented Jun 5, 2024

CodSpeed Performance Report

Merging #11748 will degrade performances by 12.18%

Comparing fix-docstring-non-visible-whitespace-content (945b033) with main (b021b5b)

Summary

❌ 1 (👁 1) regressions
✅ 29 untouched benchmarks

Benchmarks breakdown

Benchmark main fix-docstring-non-visible-whitespace-content Change
👁 linter/all-with-preview-rules[numpy/globals.py] 802.8 µs 914.1 µs -12.18%

@MichaReiser MichaReiser marked this pull request as draft June 5, 2024 08:14
@MichaReiser MichaReiser force-pushed the fix-docstring-non-visible-whitespace-content branch from a67edb4 to b0fc11d Compare June 5, 2024 08:17
@MichaReiser MichaReiser force-pushed the fix-docstring-non-visible-whitespace-content branch from b0fc11d to 674d4df Compare June 5, 2024 08:18
@MichaReiser MichaReiser marked this pull request as ready for review June 5, 2024 08:27
Copy link
Contributor

github-actions bot commented Jun 5, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@charliermarsh
Copy link
Member

Nice write-up.

@MichaReiser MichaReiser merged commit 5806bc9 into main Jun 5, 2024
20 checks passed
@MichaReiser MichaReiser deleted the fix-docstring-non-visible-whitespace-content branch June 5, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: formatting a multiline string twice gets changing results
2 participants