Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe lifetime #1918

Closed
wants to merge 17 commits into from
Closed

Unsafe lifetime #1918

wants to merge 17 commits into from

Conversation

jpernst
Copy link

@jpernst jpernst commented Feb 22, 2017

Add a new special lifetime, 'unsafe, that implicitly satisfies any constraint, but may only be instantiated within an unsafe context. This lifetime is used in situations where the true lifetime of a value is otherwise inexpressible, and additionally serves as a warning to readers that handling of the value requires special care. As a result, unsafe code is no longer ever required to use misleading "false" or redundant lifetimes, instead clearly stating that the invariants are maintained by custom logic instead of borrowck.

Rendered

@krdln
Copy link
Contributor

krdln commented Feb 22, 2017

I think it's worth considering how this feature interacts with unsafe code guidelines and affects the compiler optimizations (btw, is there any way to mention all the [unsafe code/memory model] group members?). I remember that @nikomatsakis showed that Ref snippet as an example of risk that sufficiently optimizing compiler could move dereference of the value after the drop of the borrow guard. Because of assumption that all the references are valid if they're accessible, they have a dereferencable attribute passed to LLVM (or at least they should). Now, since the 'unsafe lifetime could be shorter than a lifetime of the containing struct, it would make sense for &'unsafe not to have this dereferencable attribute.

I have also a few questions about the detailed design: In this code, does the r and slice variables have the &'unsafe _ type?

let raw: *const u8;
let r = unsafe { &*raw };
let slice = unsafe { std::slice::from_raw_parts(raw, 5) };

If yes, then what about copy variable in the following code? Does it have the 'unsafe lifetime which is only coerced on the return of the function, or does the lifetime get inferred to be 'a?

pub fn split_at_mut<'a>(&'a mut self, mid: usize) -> (&'a mut [T], &'a mut [T]) {
    let copy: &mut [T] = unsafe { &mut *(self as *mut _) }; 
    let left = &mut self[0..mid];
    let right = &mut copy[mid..];
    (left, right)
}

And one more snippet:

let uns: &'unsafe u8;
fn choose<'a>(_: &'a u8, _: &'a u8) -> &'a u8;

let x = 5;
let xref = &x; // xref: &'x u8
let r = choose(xref, uns);

What's the lifetime of r? Both 'unsafe and 'x seem to be a bad choice ('x because of erasing unsafety, 'unsafe because of forgetting that x was borrowed). It seems that a good choice would be to have someting like 'unsafe + 'x. Fortunately, that seems totally like an implementation detail, programmer doesn't have to be able to express such a lifetime.

@Storyyeller
Copy link

I don't think there's any need to omit dereferenceable. The use case for this is situations like Ref/rental/owning_ref where the logical lifetime is dynamic and hence can't be specified in Rust's type system, but the code ensures that the reference is valid whenever it is dereferenced at runtime.

@jpernst
Copy link
Author

jpernst commented Feb 22, 2017

@krdln good food for thought. I'll ponder these some more and update my draft soon, but in the meantime, here are my initial thoughts:

Per snippet 1, I'd say that, no, slice would NOT have 'unsafe lifetime. 'unsafe could never be inferred as a lifetime unless the target of the assignment has been explicitly stated to be of unsafe lifetime. In the absence of such an explicit declaration, standard lifetime inference would ensue, resulting in a compiler error if no suitable concrete lifetime can be found.

For snippet 2, if we assume that the input slice is explicitly of unsafe lifetime, then the compiler is free to choose any lifetime for 'a and coerce it to that, as is currently the case with unbounded lifetimes. Only if the output is also explicitly 'unsafe would it remain unsafe. In short, 'unsafe is eager to decay into a concrete lifetime, as an unbounded lifetime is. I'm not sure if this makes implementation easier or harder though, so perhaps there are overriding considerations that those more familiar with borrowck are aware of.

As for the final example, the chosen lifetime should be 'x. Objects of unsafe lifetime can readily be coerced into objects of any other lifetime, which is both the power and responsibility of it. Instantiating an 'unsafe lifetime is dangerous and must be treated with great care, which is why they would typically be relegated to private struct fields within an unsafe implementation, and exposed only through safe API.

As for optimization, I'm nowhere near an expert on this, so I'll defer to those with experience, but it seems reasonable to me that 'unsafe would inhibit such optimization, since the compiler cannot know the logic that determines its true valid scope. However, remembering that 'unsafe readily decays into a concrete lifetime when it is reborrowed, optimization would still take place within those scopes, since using an unsafe value in such a way is a promise that it will remain safe and valid for the duration of the concrete lifetime into which it was coerced.

Add details about:
* inference
* coercion
* drop
* optimization
@jpernst
Copy link
Author

jpernst commented Feb 23, 2017

Added some detail about what precisely requires unsafe scope. I believe this should enable the scenarios it's intended to without any soundness holes, but it's entirely possible I've missed something.

The previous rules were a bit too permissive and could allow some unsoundness to get through. This iteration should be more thorough.
remove confusing language
@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 23, 2017
@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2017

It seems like what you're looking for is closer to 'self than 'unsafe. If a lifetime is called 'unsafe, it doesn't make much sense to allow it to be dereferenced in safe Rust. Simply requiring unsafe to create the value seems like a weak constraint. If the value were verifiably safe to use, it wouldn't need the 'unsafe name. For instance, in your example where there is RefCell<T> and Ref<'unsafe, T>, the RefCell would be dropped before the reference to it. Safe rust could trivially violate memory safety in the destructor there. As another example, if I created a struct with Vec<T> and &'unsafe T pointing to one of its elements, any &mut self could trivially violate memory safety.

If a lifetime called 'unsafe were to exist, it should not be possible to dereference it in safe Rust, which makes it only marginally different than NonZero<*const T>. Something like 'self could give more semantic information to the compiler to safely express the use cases laid out in this RFC, though there are probably some significant questions around how it would affect drop.

@ticki
Copy link
Contributor

ticki commented Feb 26, 2017

This rfc is somewhat weird, but it seems pretty neat anyway.

I'm slightly divided. On one side, it seems to violate the invariants of the reference type. On the other, it cannot be created safely anyway.

But well, I suspect the discussion will be dominated of discussion about invariants.

@krdln
Copy link
Contributor

krdln commented Feb 26, 2017

@sgrif Currently in Rust, you can easily get a reference which requires unsafe to create, but not to dereference. It's just & unsafe { *raw_ptr }. This happily coerces to 'static and can cause violations of invariants in other places in code, just as your &mut Vec example. So this RFC doesn't change the status quo regarding unsafe creation of “safe” references.

If a lifetime called 'unsafe were to exist, it should not be possible to dereference it in safe Rust, which makes it only marginally different than NonZero<*const T>.

If you're proposing 'unsafe to require unsafe block to dereference, it's a totally reasonable alternative (although since it's really simple to create such an „unsafe” reference in Rust, it doesn't really give you more protection). What's important though is that 'unsafe gives you much more than just NonZero<*const T> – it allows you to say: “I have this Struct<'a> from some library and I want all the &'a references inside this struct to behave like NonZero<*const T>”. That's currently impossible to express in Rust (unless you're going to copy all the code and manually change &s to *s).

You're right that in some cases 'self and proper self-referential structs would be a good solution (it's listed as an alternative), but, quoting the RFC:

However, that is a far more complex feature. In the meantime, the presence of this feature enables the implementation of self-referential structs at the library level today, if not with ideal ergonomics.

Also, I guess there are some usecases for 'unsafe that won't be covered by self-referential structs.

@jpernst
Copy link
Author

jpernst commented Feb 26, 2017

@sgrif I went back and forth a bit on whether interacting with a 'unsafe type in any way should require unsafe, instead of just creating one. As @krdln points out, creating an unbounded lifetime from a pointer dereference already establishes precedent here, but I'm open to the compromise of strengthening the constraint. However, it should still be the case that binding a lifetime parameter with 'unsafe should not effect the semantics of the body of the item being instantiated, as outlined in the RFC. Further, interacting with a struct that has a private field of unsafe lifetime should not automatically be unsafe either.

But yes, the driving force behind this proposal is to be able to achieve a better API in rental. Currently this is impossible because the struct members I need to store are inexpressible. If there is a simpler, more minimal feature that will allow me to solve this, I'm absolutely open to it, but this proposal is the most limited I could come up with. I didn't just propose 'self in the first place since that opens questions about existential lifetimes and has a much larger design scope. In the long term I do hope that happens, but this is a problem I've been running into for a while now, and I'm strongly motivated to arrive at a solution.

And as a more general point, I do think it's an important ability to be able to explicitly bypass borrowck if it's absolutely necessary to do so, and it's currently not possible to do that in cases other than reference types.

@Rufflewind
Copy link

The concept of 'unsafe seems very similar to 'static (which itself is isomorphic to unbounded lifetimes), but also weirdly different in a few ways. It does not feel particularly orthogonal to what already exists in the language.

I think the real problem here is that there is no way to tell the borrow checker to just ignore T: 'a constraints. Adding 'unsafe feels more like a bandaid fix that ends up reinventing most of what 'static already does.

For this, it would be nice to have some way of deferring T: 'a constraints, e.g.

struct WrappedRef<'a, T>(Ref<'a, T> where T: 'a);

This is very much like the implied bounds idea (see also rust-lang/rust#20671). To an outsider, the T parameter in WrappedRef appears unconstrained. But when you try to construct it, the compiler does enforce T: 'a as usual:

fn new<'a, 'b, T: 'a>(r: Ref<'a, T>) -> WrappedRef<'a, T> {
    WrappedRef(r) // checks to make sure T: 'a
}

After that, you can transmute WrappedRef<'a, T> to Wrapped<'static, T> freely, because the T: 'a constraint is hidden from sight. You can then plug this into SelfRefStruct<T> without contaminating T with the T: 'static constraint.

This can obviously allow very dangerous violations of the trait system, but that’s what you get with transmute!

@jpernst
Copy link
Author

jpernst commented Feb 27, 2017

@Rufflewind That's a fascinating idea and worth exploring, but I have reservations. My primary concern is that it involves creating intermediate wrapper types for any type that needs its constraints deferred, since it can't be done generically. I'll have to think more about the implications this would have for rental and how I'd incorporate it, but on first blush it feels like it would generate awkward boilerplate that somewhat obscures the intent. Extracting the proper bounds to attach to the inner field would also be challenging and would almost certainly require a proc-macro. Although, using a proc-macro is on the roadmap anyway.

Personally I feel like 'static and 'unsafe being so similar makes them two sides of the same coin. They both represent the safe and unsafe dual of the same concept, as references and pointers are the safe and unsafe dual of indirection. 'unsafe also still serves well as a flashing red warning sign that the type has deferred enforcement of its invariants to logic, while using 'static in that place still represents a "false type" that can't actually exist. One of the goals of this RFC is to eliminate such false types, and allow the field to truly express its nature, instead of relying on a transmute to shoehorn it into a signature that doesn't truly represent its invariants.

@kennytm
Copy link
Member

kennytm commented Feb 27, 2017

In motivation, the workaround of SelfRefStruct makes it sounds like what you want is not 'unsafe, but a "lifetime-of" operator.

struct SelfRefStruct<T> {
    owner: RefCell<T>,
    borrower: Ref<'lifetime_of(T), T>,
}

(Yes I understand that 'self is more appropriate here.)

@eddyb
Copy link
Member

eddyb commented Feb 27, 2017

@kennytm That's exactly what for<'a> Ref<'a, T> would mean, which is what this RFC wants a simpler version of, AFAICT. Ideally, indeed, 'self (or 'owner, if we did it per-field) would be more appropriate.

@jpernst
Copy link
Author

jpernst commented Feb 27, 2017

@eddyb I did consider suggesting for<'a> as an alternative syntax, but it seemed a little odd to see it outside of trait bounds. If consensus is that it would be more appropriate though, I'm perfectly alright with that. I suppose my point of hesitation was that for<'a> doesn't imply any unsafety in the contexts in which it appears now, so having it do so in other contexts felt misleading, so I went for a more distinct form.

@kennytm
Copy link
Member

kennytm commented Feb 27, 2017

@eddyb I assume for<'a> Ref<'a, T> is just an alternative syntax of Ref<'unsafe, T>. (It looks like an existential lifetime that OP wants to avoid though #1918 (comment).)

The lifetime_of example is different from this, it extracts the concrete lifetime upper bound of T at monomorphization time, e.g. you cannot assign a &'lifetime_of(&'a u8) T to a &'static T, unlike &'unsafe T.

@Rufflewind
Copy link

Rufflewind commented Feb 27, 2017

I would say for<'a> is clearer, because the meaning of 'unsafe can become ambiguous/confusing in some contexts. Consider say fn(R<'unsafe>): would that mean

  • for<'a> fn(R<'a>) or
  • fn(for<'a> R<'a>)?

Whatever choice you pick, it's clear that 'unsafe is inherently a less flexible solution than just enabling for<'a> in types.

Edit: This also becomes a problem when you have a type with multiple 'unsafe lifetimes. Does F<'unsafe, 'unsafe> mean for<'a> F<'a, 'a> or for<'a, 'b> F<'a, 'b>?

@Storyyeller
Copy link

Storyyeller commented Feb 27, 2017 via email

@eddyb
Copy link
Member

eddyb commented Feb 27, 2017

you cannot assign a &'lifetime_of(&'a u8) T to a &'static T, unlike &'unsafe T.

That's the same for for<'a>: for<'a> &'a &'b u8 would be &'b &'b u8.

@eddyb
Copy link
Member

eddyb commented Feb 27, 2017

@krdln That's the point, it's unsafe to lift to that (always longer or equal than reality) lifetime.
But I agree it could be misleading as a real valid lifetime.

As for existential lifetimes, they're just as unsafe unless you can link references to their allocations.

@jpernst
Copy link
Author

jpernst commented Feb 27, 2017

'unsafe also has a minor advantage in that it doesn't involve any changes to the type grammar. I'm not sure how big of a deal that would actually be though; likely not too much.

@Rufflewind
Copy link

@krdln Although I think existential lifetimes are a great addition by themselves, their semantics are more like raw pointers (safe to construct, unsafe to use). In fact, I think *const T is approximately equivalent to exists<'a> &'a T. However, this proposal seems to want "unsafe to construct, safe to use" semantics instead, which would require for<'a>.

(I personally think existential semantics are more appropriate here, as they prevent the dangerous possibility of accidental unification with 'static.)

However, they are still allowed for unbounded lifetimes.
@jpernst
Copy link
Author

jpernst commented Feb 27, 2017

Removed the section about general coercions into 'unsafe, since regardless of which way this plays out, I don't think that section offers real value. They are still allowed for unbounded lifetimes.

@jpernst
Copy link
Author

jpernst commented Feb 28, 2017

@Rufflewind I'm not opposed to existential semantics in principle, but I'm concerned about the practical design implications. IIRC they've been floated and postponed before, so I'm trying to avoid retreading that ground by sticking as closely to established semantics and precedent as possible. If those familiar with borrowck's implementation and the implications of existential semantics think it's doable, then I'm on board, but for now I'm trying to blaze the shortest, narrowest path to the expressiveness that I'm blocked on.

@jpernst
Copy link
Author

jpernst commented Feb 28, 2017

Just as a note on the impetus behind this RFC: Just a few minutes ago I've fielded another support request for someone using one of my crates who wanted to store some of the structs it provides beside eachother in their own struct. This is currently inhibited by their lifetime parameters, and even rental can't yet solve it for that case, so my motivation remains strong.

I've felt pressure and been tempted to relent and just use reference counting to eliminate the lifetime params, but I really don't want to just toss away a core benefit of rust to fix this issue. I am concerned, however, that this might be silently happening in the ecosystem already. Several APIs either provide RCed alternatives, or are simply just RCed to begin with, to avoid this issue.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Mar 3, 2017

@rfcbot fcp postpone

So, I think this RFC is addressing a genuine problem, but I feel like I would rather revisit this syntax when the unsafe-code-guidelines effort has progress a little further. In particular, I feel like some aspects of how unsafe code is written may change, and there may be additional semantics we would like from an &'unsafe T reference that we cannot know at the moment. Therefore, I propose that we postpone the RFC.

I teetered a bit on this proposal, in that I also considered that it might be useful to merge the 'unsafe lifetime but keep it unstable. But I actually don't see what we will gain from experimentation that is not already clear -- and libraries that used it would be nightly only -- hence I think I'd rather just wait.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 3, 2017

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@jpernst
Copy link
Author

jpernst commented Mar 3, 2017

Waiting for the unsafe code guidelines before stabilization is perfectly reasonable and I support that, but I have one final plea for feature gating this sooner and allowing it to evolve.

The immediate use of this feature will be to prototype a better library-level solution to the self-referential struct problem, and I have a few ideas on that front that I'd like to begin exploring. At this stage I can't fully envision the implications of the API for all scenarios or how it will interact with real world codebases, and there's no way to test it currently since the implementation for the API I'd like to expose cannot be written. I'm willing to follow along with the unsafe guidelines as they evolve and ensure my solution is fully conformant, in the interest of having a viable day 1 solution at such time as whatever final semantic form stabilizes.

Either way, thanks to all for the consideration and discussion of this RFC.

@Storyyeller
Copy link

Storyyeller commented Mar 6, 2017

I found another crate that could use this: shawshank

Currently, it uses the type signature struct ArenaSet<O: StableAddress, I = usize, M = HashMap<&'static O::Target, I>>, but the 'static should be an 'unsafe.

@strega-nil
Copy link

strega-nil commented Mar 14, 2017

My first reaction is "just use 'static to mean this, that's what I do".

My second reaction is "oh, that's kind of gross, I see what the point of this is now".

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 19, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Apr 19, 2017
@Storyyeller
Copy link

Would it make sense to at least reserve 'unsafe in the mean time to reduce the risk of future breakage?

@eddyb
Copy link
Member

eddyb commented Apr 22, 2017

@Storyyeller I thought all keywords were reserved in lifetime names, is that not the case?

@Storyyeller
Copy link

I didn't realize that. In that case, there's no problem.

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 29, 2017

The final comment period is now complete.

@aturon
Copy link
Member

aturon commented May 1, 2017

Closing as postponed. Thanks @jpernst!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.