Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TryFrom<&OsStr> for &str #98202

Merged
merged 1 commit into from
Jun 14, 2023

Conversation

aticu
Copy link
Contributor

@aticu aticu commented Jun 17, 2022

Recently when trying to work with &OsStr I was surprised to find this impl missing.

Since the to_str method already existed the actual implementation is fairly non-controversial, except for maybe the choice of the error type. I chose an opaque error here instead of something like std::str::Utf8Error, since that would already make a number of assumption about the underlying implementation of OsStr.

As this is a trait implementation, it is insta-stable, if I'm not mistaken?
Either way this will need an FCP.
I chose "1.64.0" as the version, since this is unlikely to land before the beta cut-off.

@rustbot modify labels: +T-libs-api

API Change Proposal: #99031 (accepted)

@rust-highfive
Copy link
Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jun 17, 2022
@rust-highfive
Copy link
Collaborator

r? @thomcc

(rust-highfive has picked a reviewer for you, use r? to override)

@rustbot rustbot added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 17, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 17, 2022
@thomcc
Copy link
Member

thomcc commented Jun 17, 2022

We have a new process for unstable APIs, you need to file a change proposal first. Quoting the bot:

If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already

The implementation (and the API) look fine to me, but must go through the process.

#[stable(feature = "str_tryfrom_osstr_impl", since = "1.64.0")]
#[derive(Debug)]
#[non_exhaustive]
pub struct StrFromOsStrError {}
Copy link
Member

@yaahc yaahc Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of error information density, I wonder if we shouldn't set the bar higher. There is a lot more information in a UTF8Error, and we already create and discard one on at least some platforms with the current implementation. Maybe we should skip relying on OsStr::to_str and build something new on top of the more fundamental APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally considered that as well, but I think might make too many assumptions about the internal representation of OsStr. Some future platform might choose a non-UTF-8 like encoding. On such a platform it may not make much sense to talk about specific byte locations of errors. It might not even on current platforms since (as far as I'm aware) there's no way to look at the byte representation of an OsStr. Therefore I chose the more conservative approach at first.
However if the consensus is to implement it with Utf8Error as the error type, I'd be open to implementing that as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should still be an opaque struct like you've done here, and we should probably document that the internal representation is going to be inherently unstable and is kept intentionally opaque but we can still have that internally be a UTF8Error when we're on unix and give nice utf8 related error messages, or we could do some windows specific wtf8 error about it containing surrogate pairs or w/e level of context we want to expose there, and so on.

Copy link
Member

@ChrisDenton ChrisDenton Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're pretty constrained in what OsStr can be. The impl AsRef<OsStr> for str implies that every valid str must also be (bit-for-bit) a valid OsStr. So if nothing else, surely this TryFrom can report the valid UTF-8 span and the offset and length of the non-UTF8 span, without providing details of why a particular sequence of bytes is not UTF-8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This basically means that an OsStr must always have something like [u8] as the underlying representation and therefore it talking about sub-spans of the string should always make sense.
So should the error type simply be Utf8Error or some opaque error wrapping a Utf8Error and exposing some of it's details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, did the Error API not get considered as part of the FCP?

Copy link
Member

@thomcc thomcc Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could extend this as further work. It seems very reasonable to either expose the utf8 error, or something equivalent.

That said, prior to it being available as bytes, doing so may leak details about our internal representation (on windows platforms for example I believe this would be the case), so personally I think we should wait until #95290 is more settled.

This also avoids needing to do further API team work on the design of the error type's API, which I think means it could just land as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, I'm not suggesting that blocks landing this, just that it would block exposing information about in which way the UTF 8 bytes are invalid)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomcc No the error type was not part of the FCP. In retrospective it would have been useful to make that part of it. Do I understand your suggestion correctly that you want to keep the UTF8Error type as the error type but with the methods on that type returning dummy values instead, so that users cannot rely on the internal representation? In that case that should probably be documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think that would be good. I think exposing an opaque struct with no methods aside from trait impls would be better. I'm not t-libs-api though.

@aticu aticu force-pushed the impl_tryfrom_osstr_for_str branch from b7fe5dc to 2a2b29b Compare July 2, 2022 23:31
@rustbot
Copy link
Collaborator

rustbot commented Jul 2, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@aticu
Copy link
Contributor Author

aticu commented Jul 2, 2022

I've updated the code to use the Utf8Error type instead. I'm not quite sure if the code of Wtf8::as_str has exactly the same semantics now as before, but I think at least performance should be somewhat comparable. It might be better to use the previous code and construct the Utf8Error with the information available, since it's all there, but that's not possible as far as I know, because the fields are private to the core crate.

@thomcc thomcc added S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 22, 2022
@aticu
Copy link
Contributor Author

aticu commented Jul 26, 2022

@thomcc With the FCP completed, what do you think of the current implementation?

@thomcc
Copy link
Member

thomcc commented Jul 26, 2022

Impl looks fine. Unsure how settled we are on the error API, r=me if it doesn't need further discussion (certainly this would be my recommendation for now, since we can always expose more later).

@thomcc
Copy link
Member

thomcc commented Jul 29, 2022

I'm going to reassign to @yaahc to figure out the situation with the error type and presumably r+.

r? @yaahc

@rust-highfive rust-highfive assigned yaahc and unassigned thomcc Jul 29, 2022
@dtolnay
Copy link
Member

dtolnay commented Dec 23, 2022

Reassigning yaahc's reviews as she has stepped down from the review rotation.

r? rust-lang/libs-api

@rustbot rustbot assigned joshtriplett and unassigned yaahc Dec 23, 2022
@Dylan-DPC Dylan-DPC added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. labels Jan 19, 2023
@ChrisDenton ChrisDenton added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Apr 14, 2023
@ChrisDenton
Copy link
Member

I've marked this as waiting on team, due to the (unresolved?) question of the error type. Reusing Utf8Error does make some sense to me assuming we decide its ok to slice OsStr in some way (perhaps via transmute).

@m-ou-se m-ou-se removed the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Apr 19, 2023
@Amanieu
Copy link
Member

Amanieu commented May 30, 2023

We discussed this in the libs-api meeting today. We're happy to just use Utf8Error as the error type. Restarting FCP since this is a (minor) API change.

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented May 30, 2023

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label May 30, 2023
@rfcbot rfcbot added disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels May 30, 2023
@rfcbot
Copy link

rfcbot commented Jun 4, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@m-ou-se m-ou-se removed the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Jun 6, 2023
@aticu aticu force-pushed the impl_tryfrom_osstr_for_str branch from 2a2b29b to e3a1a11 Compare June 12, 2023 08:47
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Jun 14, 2023
@rfcbot
Copy link

rfcbot commented Jun 14, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@Amanieu
Copy link
Member

Amanieu commented Jun 14, 2023

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Jun 14, 2023

📌 Commit e3a1a11 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 14, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 14, 2023
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#98202 (Implement `TryFrom<&OsStr>` for `&str`)
 - rust-lang#107619 (Specify behavior of HashSet::insert)
 - rust-lang#109814 (Stabilize String::leak)
 - rust-lang#111974 (Update runtime guarantee for `select_nth_unstable`)
 - rust-lang#112109 (Don't print unsupported split-debuginfo modes with `-Zunstable-options`)
 - rust-lang#112506 (Properly check associated consts for infer placeholders)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 4efdb5c into rust-lang:master Jun 14, 2023
@rustbot rustbot added this to the 1.72.0 milestone Jun 14, 2023
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.