Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.1.13 breaks semver #55

Closed
joshka opened this issue Jun 5, 2024 · 25 comments
Closed

v0.1.13 breaks semver #55

joshka opened this issue Jun 5, 2024 · 25 comments

Comments

@joshka
Copy link

joshka commented Jun 5, 2024

In ratatui, running cargo update from v0.1.12 to v0.1.13 results in the latter failing (something has changed in the way that unicode-width handles zero width strings

Error from our tests

--- STDOUT: ratatui buffer::buffer::tests::set_string_zero_width ---

running 1 test
test buffer::buffer::tests::set_string_zero_width ... FAILED

failures:

failures:
buffer::buffer::tests::set_string_zero_width

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1557 filtered out; finished in 0.00s

--- STDERR: ratatui buffer::buffer::tests::set_string_zero_width ---
thread 'buffer::buffer::tests::set_string_zero_width' panicked at src/buffer/buffer.rs:618:9:
assertion left == right failed
left: Buffer {
area: Rect { x: 0, y: 0, width: 1, height: 1 },
content: [
"",
],
styles: [
x: 0, y: 0, fg: Reset, bg: Reset, modifier: NONE,
]
}
right: Buffer {
area: Rect { x: 0, y: 0, width: 1, height: 1 },
content: [
"a",
],
styles: [
x: 0, y: 0, fg: Reset, bg: Reset, modifier: NONE,
]
}

I'd suggest yanking 0.1.13 and publishing the latest release as 0.2.0 instead.

joshka added a commit to ratatui/ratatui that referenced this issue Jun 5, 2024
@Manishearth
Copy link
Member

This is a behavior change and not considered breaking by this crate.

@Manishearth Manishearth closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2024
@joshka
Copy link
Author

joshka commented Jun 5, 2024

This is a behavior change

That's literally what semver suggests. You broke the implied contract that 0.1.12 should be compatible with 0.1.13 and in doing so broke our tests, which implies that downstream user's apps will be rendered differently.

@Manishearth
Copy link
Member

"should be compatible with" has an extremely broad range of meanings. I'm careful to make sure that code will compile; but changing the code to have more accurate answers is not a breakage.

The Rust stdlib's semver policy (which is roughly what the ecosystem follows as well) talks about behavioral changes as a notion of contract. It really depends on the situation, and I am thoroughly unconvinced that this crate already has or needs such a strong stability contract.

This crate says it attempts to approximate the displayed width of characters. That's not a guarantee of stability; and we've (well, @Jules-Bertholet) been trying to improve the accuracy of these heuristics.

You are free to rely on such behavior in tests, but you do have to expect that tests will break some time due to this kind of thing. If you want to pin the behavior, pin the dependency.


There is an argument to be made for an more stable (not 100% stable: the unicode standard and data this relies on itself changes!) API that uses the east asian width properties only (@Jules-Bertholet: would this be easy to add again? I was somewhat uncomfortable about expanding the scope of the crate, but UAX 11 is pretty vague about this stuff anyway)

@joshka
Copy link
Author

joshka commented Jun 5, 2024

The Rust stdlib's semver policy (which is roughly what the ecosystem follows as well) talks about behavioral changes as a notion of contract. It really depends on the situation, and I am thoroughly unconvinced that this crate already has or needs such a strong stability contract.

from the link:

In general, APIs are expected to provide explicit contracts for their behavior via documentation, and behavior that is not part of this contract is permitted to change in minor revisions.

A changelog is the exact form of documentation that would have helped notify this change in behavior, but you've rejected adding one in #56

the behavior change here is that the width of `"\u{1}" is now 1 instead of 0. It is however rendered as taking up no space as evidenced by the test output.

Working out what the change is caused by and why is next to impossible with the current approach (I have to look at every single commit that was made to work out relevance, and then backtrack to find a related PR). This is not ideal and there are easy ways that help consumers of this library work understand the changes more easily.

@Manishearth
Copy link
Member

It is however rendered as taking up no space as evidenced by the test output.

This is highly highly dependent on the exact software stack used and is why this crate is explicitly documented as not necessarily matching actual rendered output.

Yes, the documentation could be clearer about how it approximates "displayed width", and why it is an ultimately futile effort to try and accurately predict display width from just codepoints. You've been lucky so far that your tests were dealing with cases that matched up.

If you want such an operation, this is not the crate for you, and furthermore, that operation does not exist.

(I've been meaning to write documentation on why that is the case for a while, but haven't had the time)

A changelog is the exact form of documentation that would have helped notify this change in behavior, but you've rejected adding one in #56

I don't think that's what the RFC means by "documentation".

But seriously, this is absolutely not my experience with semver in the Rust community: this kind of behavior change is almost always fine. Of course making a function do something entirely different would be breaking, but changing functions to still continue to do what they are documented as doing. This function still does what it is documented as doing.

(I have to look at every single commit that was made to work out relevance, and then backtrack to find a related PR)

That's ... what you have to do with the changelog too? I rejected the changelog because it produces more or less the same output as scrolling through commits.

joshka added a commit to ratatui/ratatui that referenced this issue Jun 6, 2024
semver breaking change in 0.1.13
<unicode-rs/unicode-width#55>

<!-- Please read CONTRIBUTING.md before submitting any pull request. -->
joshka added a commit to ratatui/ratatui that referenced this issue Jun 6, 2024
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1.
Our tests assumed that \u{1} had a width of 0, so this change replaces
the \u{1} character with \u{200B} (zero width space) in the tests.

Upstream issue (closed as won't fix):
unicode-rs/unicode-width#55
@workingjubilee
Copy link

That's literally what semver suggests.

SemVer disagrees with the Rust community on what SemVer means..

@joshka
Copy link
Author

joshka commented Jun 6, 2024

To be clear - I'm using the Rust definition of semver, which treats 0.x.y the same way semver would treat x.y.0

https://doc.rust-lang.org/cargo/reference/resolver.html#semver-compatibility

This notion of “compatibility” is important because Cargo assumes it should be safe to update a dependency within a compatibility range without breaking the build.

Versions are considered compatible if their left-most non-zero major/minor/patch component is the same. ... This convention also applies to versions with leading zeros. For example, 0.1.0 and 0.1.2 are compatible, but 0.1.0 and 0.2.0 are not. Similarly, 0.0.1 and 0.0.2 are not compatible.

https://doc.rust-lang.org/cargo/reference/resolver.html#semver-breaking-patch-release-breaks-the-build

Sometimes a project may inadvertently publish a point release with a SemVer-breaking change. When users update with cargo update, they will pick up this new release, and then their build may break. In this situation, it is recommended that the project should yank the release, and either remove the SemVer-breaking change, or publish it as a new SemVer-major version increase.

...

If it looks like the third-party project is unable or unwilling to yank the release, then one option is to update your code to be compatible with the changes, and update the dependency requirement to set the minimum version to the new release.

This is the path that we are taking in ratatui/ratatui#1171, but it's a poor one given the blast radius of this sort of breaking change. The changes here are:

  • silent - cargo update brings the changes in as 0.1.12 is rust semver compatible with 0.1.13
  • undocumented - there is no changelog
  • potentially affect millions of consumers

The right thing to do is yank and publish 0.2, and document breaking changes like this. But as this issue is closed and we've worked around it for our particular use case, I'll leave pushing this issue forward to any of the 584 other direct dependencies of unicode-width that have actual breaks cause by this.

Our reliance on \0 and \u{1} were just as being known zero-width characters to use in rendering tests that had to deal with how to render zero-width characters at various places in terminal UI. These particular characters changed width inbetween 0.1.12 and 0.1.13 for some reason.

@Manishearth
Copy link
Member

Yeah, it changed because the Unicode standard recommends they be rendered unless they are interpretable in that context (which these days is almost never).

#45 (comment)

This crate implements Unicode standards.

@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Jun 6, 2024

There is an argument to be made for an more stable (not 100% stable: the unicode standard and data this relies on itself changes!) API that uses the east asian width properties only (@Jules-Bertholet: would this be easy to add again? I was somewhat uncomfortable about expanding the scope of the crate, but UAX 11 is pretty vague about this stuff anyway)

An API that uses the East_Asian_Width property only would assign \u{1} width 1 (and thus also break the offending Ratatui test); the change in the width of the control characters actually brought things closer to UAX 11. That being said if you want an API based on "simple" rules that don't change much, UnicodeWidthChar is likely the best choice:

use unicode_width::UnicodeWidthChar;

pub fn dumb_width(s: &str) -> usize {
    s.chars().map(|c| c.width().unwrap_or(0)).sum()
}

This crate implements Unicode standards.

This is the core of the issue. Terminal emulators assign semantics to the control characters that go beyond what Unicode specifies. This is totally correct and fine (extra semantics introduced by higher-level protocols is what the control characters are for). But it means that if your application interprets control characters according to such a higher-level protocol, you need to do that interpretation and processing before you pass your data to a Unicode library like unicode-width.

@Manishearth
Copy link
Member

Manishearth commented Jun 6, 2024

@Jules-Bertholet since it's taken me so long to get around to writing this, would you be interested in adding documentation to the crate and readme that explain how and why this crate can only ever be an approximation, and then reference that in the individual APIs as a caveat when we mention "displayed width".

@workingjubilee
Copy link

Given a lot of people interpret unicode-width as "terminal width for this Unicode data (or however many cells it should take up in a cellular grid), or near enough" it seems like it would be germane to offer specific examples that address that exact use-case.

@joshka
Copy link
Author

joshka commented Jun 6, 2024

We have a pretty long issue that doesn't really have a resolution at ratatui/ratatui#75 about what exactly to do with working out the character widths. It seems at least where the terminal is concerned there's no real solution that solves all the possible issues (and incompatible terminal implementations). As such until someone invests enough time in working through those issues, we're sticking with unicode-width for the purpose of working out where to draw cells (even if it's the wrong thing sometimes). I suspect this will be probably be forever the case.

I note there's also https://crates.io/crates/unicode-display-width, but even that has some problems that would take time to investigate to work out whether it's "better"

@Manishearth
Copy link
Member

As such until someone invests enough time in working through those issues

yeah, people are working on this upstream at Unicode but progress is slow (https://www.unicode.org/L2/L2023/23194-text-terminal-wg-report.pdf), and it would still not result in an algorithm that solves all the problems since terminals are inconsistent (it would however give terminals better guidance on how to be consistnt).

we're sticking with unicode-width for the purpose of working out where to draw cells (even if it's the wrong thing sometimes). I suspect this will be probably be forever the case.

Sure, but please recognize that if this crate isn't precisely what you're looking for, this crate may do things that diverge from your expectations, and your tests may occasionally break.

It is definitely not the intent of this crate to provide some stable displayed width algorithm because such a thing does not exist.

@joshka
Copy link
Author

joshka commented Jun 6, 2024

Also - despite my comment about the solution being a poor one overall, it's definitely the right solution for Ratatui. Our tests should have used u+200B when dealing with zero-width character handling instead of \0 and u+0001, so despite the breakage, it was a breakage that caught an assumption of ours that was incorrect.

I can definitely see how that sort of thing makes sense to just release as a point release. In Ratatui's case have done something similar for a rendering change, but in doing so, we communicated the change in our changelog and breaking changes docs, and these flow through to the release notes of the release on GitHub, and from there also to any PRs raised by dependabot in downstream repos. These wouldn't be hidden away in git commits and PR comments like they are here.

@Jules-Bertholet
Copy link
Contributor

documentation to the crate and readme that explain how and why this crate can only ever be an approximation, and then reference that in the individual APIs as a caveat when we mention "displayed width".

The README already has such a warning, which #53 modifies, but we should probably add something to the Rustdoc yes. And also some notes about terminal emulators

@Jules-Bertholet
Copy link
Contributor

I note there's also https://crates.io/crates/unicode-display-width, but even that has some problems that would take time to investigate to work out whether it's "better"

Here are the important differences:

  • unicode-display-width properly handles emoji skintone modifiers and ZWJ sequences, this crate doesn't yet (but Support lots of ligatures #53 changes that)
  • unicode-display-width will treat spacing combining marks as non-advancing in most cases, while unicode-width usually gives them non-zero width. For some scripts this makes unicode-display-width unambiguously worse, for others both approaches are differently bad.
  • unicode-display-width doesn't handle ligatures that extend beyond a single grapheme cluster; unicode-width currently handles one, and Support lots of ligatures #53 adds a lot more (though it's impossible to be "perfect" here).
  • unicode-display-width will assign non-zero width to defective combining character sequences and nonstandard Korean jamo sequences, while unicode-width usually assigns width 0 to these. unicode-display-width's behavior is unambiguously more correct according to the text of the Unicode Standard. However, these are edge cases that should never occur in "properly formatted" text; and the unicode-width behavior is more performant, and also leads to generally better results when calculating width character-by-character with UnicodeWidthChar. (Terminal emulators will likely behave closer to the unicode-width behavior)

@joshka
Copy link
Author

joshka commented Jun 6, 2024

I note there's also https://crates.io/crates/unicode-display-width, but even that has some problems that would take time to investigate to work out whether it's "better"

Here are the important differences:

Thanks for documenting those. I think this probably makes unicode-width the right choice at least for now.

@decathorpe
Copy link

It looks like the changes in v0.1.13 are breaking a number of crates, not only ratatui. The CI we have in Fedora Linux started showing red for some of our Rust packages after we updated unicode-width to version 0.1.13:

  • bwrap
  • cursive_core
  • git-delta
  • just
  • lsd
  • prettytable-rs
  • tui

There might be more, the CI hasn't yet checked all 3000 Rust packages we have, and quite a large number of them depend on unicode-width (~300).

This probably won't change your mind about "this is not a breaking change", but it should at least give some food for thought that the change here might have affected more projects than initially assumed.

@Manishearth
Copy link
Member

good to know, thanks. In general yes, I maintain that changing behavior in such a way is not a breaking change, and I absolutely expect such tests to break; that's what tests are for. A fair number of packages use this crate for estimating displayed terminal width and if they test this behavior they can expect certain tests to not work as the crate is improved.

@joshka
Copy link
Author

joshka commented Jun 7, 2024

tui is definitely EOL - it's why ratatui exists. That's a pretty big blast radius as it effects every app built with the library (341 reverse deps on crates.io, 480 packages / 11,416 repos on github).

Regardless of whether this is a "breaking change", it's a change that breaks things. I'd urge you to please consider relaxing your focus on technical correctness here and focus on the actual impact.

joshka added a commit to ratatui/ratatui that referenced this issue Jun 7, 2024
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1.
Our tests assumed that \u{1} had a width of 0, so this change replaces
the \u{1} character with \u{200B} (zero width space) in the tests.

Upstream issue (closed as won't fix):
unicode-rs/unicode-width#55
@decathorpe
Copy link

I found an example where this change definitely results in a bug - the width of the \n character is now considered to be 1 instead of 0. Giving control characters a width of 1 instead of 0 might make sense for most control characters, but it definitely doesn't for characters like \n, which are handled by all terminals, and definitely not printed as width-one "garbage" characters or whatever.

@joshka
Copy link
Author

joshka commented Jun 7, 2024

I found an example where this change definitely results in a bug - the width of the \n character is now considered to be 1 instead of 0. Giving control characters a width of 1 instead of 0 might make sense for most control characters, but it definitely doesn't for characters like \n, which are handled by all terminals, and definitely not printed as width-one "garbage" characters or whatever.

To be clear, this is based on the spec - see #45 and read the parts of the unicode spec that are relevant (Those bits are not as dense as some). Unicode 0.1.12 used a canonical display definition of what a control character's width was, unicode 0.1.13 uses the unicode spec display fallback definition and states that this is a compatible non-breaking change with 0.1.12. Regardless of whether this is correct or not, it's a change that significantly breaks code that relies on the previous behavior.

More succinctly stated:

  • The issue isn't the changes made in 0.1.13
  • The issue is that 0.1.13 was released as compatible with existing code using 0.1.12 when it's not.
  • v0.1.13 should be v0.2.0.

good to know, thanks. In general yes, I maintain that changing behavior in such a way is not a breaking change, and I absolutely expect such tests to break; that's what tests are for. A fair number of packages use this crate for estimating displayed terminal width and if they test this behavior they can expect certain tests to not work as the crate is improved.

Crates should not generally be expected to test the behavior of a dependency like this. The broken tests you're seeing are likely just a small portion of broken behavior. In our case, our tests were wrong(-ish), and have since been fixed. But this means now we don't have any tests that would have picked up the behavior change and so the change is felt by the users of our library. The changes will affect the behavior of every single package that transitively relies on this package. With 300k daily downloads are you really sure that breaking this behavior is a net positive without notifying any of the packages that this changed?

@Manishearth
Copy link
Member

Manishearth commented Jun 7, 2024

I'd urge you to please consider relaxing your focus on technical correctness here and focus on the actual impact.

This misunderstands my position. I think the "impact"; "some people's tests are breaking", is quite acceptable, and even to some extent desired. If crates are testing for this kind of behavior, I want these tests to break, I want people to notice. A broken test does not automatically mean it is the upstream crate's problem, in this case the existence of the test may imply a certain model about this crate that is incorrect, and even if it isn't incorrect it implies the crates care about this in a way that is incompatible with what the crate does.

Ironically, when @Jules-Bertholet started making these changes I was somewhat against them because I wanted to stick to the original model that this crate only did East Asian Width. However, it was pretty clear to me over years issues filed on this repo by people buiding terminal stuff, that most users of this crate wanted this to handle more than just East Asian Width. It's clear from ratatui's usage that that's the way y'all use this crate too!

So honestly it's baffling to me that such a big deal is being made about what boils down to tests breaking. I am interested in knowing about it, and understanding the problems involved, but I so far have not been convinced of the severity of the issue, and feel that in general these changes serve the terminal crate community better.

As I've said before we are definitely lacking documentation on this kind of thing and I've wanted to improve it for a while, a lot of this is good signal for that. But I'm not convinced this is a large or undesirable impact.

Crates should not generally be expected to test the behavior of a dependency like this.

I mean, no, I think testing this kind of behavior is typically incorrect, but if you choose to do so, then the tests being broken seems like a fine signal.

(typically incorrect, with the caveat that I have definitely in the past found value for tests for behavior like this with the foreknowledge that such behavior may change, where I want to make sure the test broke for a "good reason" not a "bad reason")


The changes will affect the behavior of every single package that transitively relies on this package. With 300k daily downloads are you really sure that breaking this behavior is a net positive without notifying any of the packages that this changed?

yes. and I'm going to state this clearly: at this point you have been quite patronizing, preferring to strongly advocate for specific solutions rather than talk about problems, and repeatedly come off as using shame as a motivator. I have very little desire to deal with this and will not engage if this continues, potentially blocking you.

@Manishearth
Copy link
Member

Manishearth commented Jun 7, 2024

@decathorpe

I found an example where this change definitely results in a bug - the width of the \n character is now considered to be 1 instead of 0. Giving control characters a width of 1 instead of 0 might make sense for most control characters, but it definitely doesn't for characters like \n, which are handled by all terminals, and definitely not printed as width-one "garbage" characters or whatever.

So, this is kinda nuanced. The standard says to treat ones that do not have a different representation as width 1. \n does have a different representation, so it is not a slam dunk from the standard that this must be width 1.

However, \n has a different representation that does not fit within the "what width is this?" ontology in the first place! Any application handling these would have a higher level protocol breaking the string here1 and never feeding this character directly to this algorithm. This is going to be true for other line breaks too.

I think we should definitely make a deliberate choice as to behavior here for all newlines, and be consistent (and document it). My guess is that they should go back to being 0 but I would like to see some discussion of that first. I'll file an issue. (edit: filed #60)

Footnotes

  1. this crate cannot provide such an algorithm, since there are reasons to break a string that do not come from the contents of its text stream, like wrapping. Plus you have the Windows vs Unix newline issue.

@joshka
Copy link
Author

joshka commented Jun 7, 2024

yes. and I'm going to state this clearly: at this point you have been quite patronizing, preferring to strongly advocate for specific solutions rather than talk about problems, and repeatedly come off as using shame as a motivator. I have very little desire to deal with this and will not engage if this continues, potentially blocking you.

If I have offended in any way, please accept my apologies. That was not my intent. In the interest of finding a resolution on this problem, I'm going to bow out and leave it to other interested parties to continue. Thank you for investing your time and energy on this.

itsjunetime pushed a commit to itsjunetime/ratatui that referenced this issue Jun 23, 2024
semver breaking change in 0.1.13
<unicode-rs/unicode-width#55>

<!-- Please read CONTRIBUTING.md before submitting any pull request. -->
itsjunetime pushed a commit to itsjunetime/ratatui that referenced this issue Jun 23, 2024
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1.
Our tests assumed that \u{1} had a width of 0, so this change replaces
the \u{1} character with \u{200B} (zero width space) in the tests.

Upstream issue (closed as won't fix):
unicode-rs/unicode-width#55
aswild added a commit to aswild/skim that referenced this issue Jul 30, 2024
unicode-width's change "Control characters have width 1" between
versions 0.1.12 and 0.1.13 changed the logic so that control characters
are asumed to have width of 1 rather than 0 width [1]. This breaks
skim's rendering pretty badly unfortunately, mainly by not erasing
cells that should be erased, so the output looks garbled.

Apparently this change broke other parts of the ecosystem too, such as
the (actually-maintained, unlike tuikit) ratatui crate. There's some
good discussion in [2] about how fundamentally unicode-width is not
intended to be "how wide does this draw in a terminal emulator", despite
many people using it that way.

Other options should be explored longer term (if I decide to
meaningfully pick up maintenance of skim), such as the
unicode-display-width crated which is mentioned in [2] but still might
not be the right algorithm. I think really the proper fix is to strip
control characters within skim (or the TUI library) before doing
anything to determine the unicode width.

But for now, take the easy way out and just pin the unicode-width
version to something below 0.1.13 and forget about it until I have more
time to put into caring about improving skim.

P.S. Change .width_cjk() to .width() in a couple places for consistency.

[1] unicode-rs/unicode-width@3063422
[2] unicode-rs/unicode-width#55
th1000s added a commit to th1000s/delta that referenced this issue Sep 10, 2024
v0.1.13 calculates the width of some characters differently, which
causes delta to panic

See unicode-rs/unicode-width#55 and unicode-rs/unicode-width#66
dandavison pushed a commit to dandavison/delta that referenced this issue Sep 11, 2024
v0.1.13 calculates the width of some characters differently, which
causes delta to panic

See unicode-rs/unicode-width#55 and unicode-rs/unicode-width#66
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants