Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - Zero-Sized References #2040

Closed
wants to merge 4 commits into from

Conversation

EpicatSupercell
Copy link

@EpicatSupercell EpicatSupercell commented Jun 23, 2017

@SimonSapin
Copy link
Contributor

Is it possible to do this in a library with PhantomData<&'a ()>?

@EpicatSupercell
Copy link
Author

The main purpose is to help with polymorphic code, that usually doesn't deal with ZST.

@bugaevc
Copy link

bugaevc commented Jun 23, 2017

Rendered

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

IIRC we haven't pursued this idea in the past because of the potential for breakage involved.

# How We Teach This
[how-we-teach-this]: #how-we-teach-this

For most rust users, this change will be invisible. Thier code will just become a tiny bit smaller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thier -> Their

Users of unsafe rust might encounter this case. Therefore there should probably be a note in the nomicon that references might be optimized away for ZST.

# Drawbacks
[drawbacks]: #drawbacks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also mess with monomorphization, because you can't know the size of a reference anymore without knowing the type. I'm not sure of the implications for generic functions. Currently transmute<&T, usize> works inside of generic functions. But we had the same issue with transmuting fn pointers, so there's some precedent for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's already the case: &mut Trait has a different size than &mut i32.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do it based on sized-ness and the kind of "maybe-DST tail" when that is unknown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sized-ness any easier to track than zero-sizedness?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sized-ness is "easier" to track than zero-sizedness, because a type parameter is assumed sized unless one explicitly declares otherwise via the Sized? marker. Once you have <T: Sized?>, then the generic code cannot make assumptions about the sized-ness of T.

other_ffi_function(x);
```
We might want to represent a pointer we got from FFI and checked it's not null as a reference to an object.
Since that object is not accessable directly we represent it as an empty struct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc that is not how external structures should be represented, so breaking that at compile-time sounds like a win to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik that's currently the best way to represent external structures, see the discussion on the extern type RFC


### Mitigating breakage

Safe rust code should be affected positively by this change. However unsafe code might break.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a dangerous statement. Unsafe code isn't any less stable than safe code. I think there should be some review of existing code that might break before deciding whether to do this. If all the unsafe code that breaks is some code that should never have been written in the first place because there's better ways to do it, then imo that would be ok.


### Specific examples - Pro

The RFC isn't well-justified until it has at least one detailed use case where it helps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that it's enough justification that this will solve all the issues around addresses and zsts once and for all.

@strega-nil
Copy link

strega-nil commented Jun 23, 2017

I disagree completely with this change. & pointers are still pointers, and people trust that they don't change address. Also, Option<&T> and &T are always exactly layout-compatible with *const T - this would change that, and that's not okay, imo.

@strega-nil
Copy link

This would also make references not extern "C" compatible, which is a major breaking change; if someone uses a ZST to represent an opaque type, for example.

@RalfJung
Copy link
Member

Shouldn't extern "C" use pointers anyway? C doesn't have references the way Rust does...

@dlight
Copy link

dlight commented Jun 23, 2017

"Breaking code" is a drawback big enough to make this Rust 2.0 material

@SimonSapin
Copy link
Contributor

Even in an hypothetical Rust 2.0 (which by the way I hope won’t happen for many years!), breaking code that uses &ZST as a way to implement &Opaque is disruptive enough that it shouldn’t be done.

A zero-size reference-like thing can be useful, but I think it should be a separate type. Something like:

// Does not implement Deref
pub struct ZeroSizedRef<'a, T>(std::marker::PhantomData<&'a T>);

@EpicatSupercell
Copy link
Author

A couple points I want to add (a bit meta):

  • I am not as much pushing this RFC, as opening it for debate. I have heard strong opinions in both directions, and would like to have a discussion to come to an agreement. We could then point people towards this discussion whenever questions arise.

  • The discussion of this RFC is expected to drift towards the definition of & and it's valid / invalid uses. That is intended and should be counted as on-topic.


Also, clarification about breaking code:

I want to find a solution that will not break valid unsafe code. I have no strong opinion about breaking non-valid unsafe code. However, since I have little knowledge about much of the unsafe code in the wild, the specifics of these should be discussed here.

@glaebhoerl
Copy link
Contributor

AFAICT there's two basic use cases here, which are both valid, but in tension:

  • If data structures with custom allocators store references to their allocators, and you define some global allocator with a ZST just to identify it, it would be unfortunate if data structures instantiated with it as their allocator had to physically store pointers which don't actually point to anything useful. (N.B. I'm not sure I have the details of this right, but there are probably analogous cases even if I don't.)

  • FFI code wants to use ZSTs (or similar) as stand-ins for foreign types, with &ZST containing no useful information on the Rust side, but faithfully preserving the memory address for when it is passed back into foreign code. (The extern type or OpaqueData RFCs might provide a not-technically-ZST option, but in any case we can't break code using the currently-existing one.)

How about an opt-in approach where authors can tag specific zero-sized types with "references to this type should also be ZSTs"? That satisfies the allocator use case (with a bit of manual annotation) without affecting any existing code. (My suspicion is that generic code would not pose a problem either because Rust implements it exclusively with monomorphization, but I haven't thought about it very deeply!)

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

You can transmute from/to &T with a generic T. So you'd need a trait that types opt out of and is present as a bound by default (i.e. like Sized), or some code will still break.

@glaebhoerl
Copy link
Contributor

@eddyb Is there actually generic code in the wild which does it? (I'm not doubtful, simply ignorant.)

@eddyb
Copy link
Member

eddyb commented Jun 23, 2017

Yes, we had to add a special-case because there were about a dozen (IIRC) crates which transmuted between pointers to T: ?Sized. I never actually checked how many would break if we pretended we didn't know the layout of pointers to T: Sized, but if the standard library doesn't abuse it then we could experiment with a small check for "does this type contain any type parameters?" (instead of "is it sized?").

@strega-nil
Copy link

strega-nil commented Jun 24, 2017

@RalfJung no. If you want more info, hit me up on irc, I don't want to write it up here. Basically, it's to do with correct types being the best documentation. If you expect a non-null pointer that doesn't alias anything else, take an &T, don't take a *const T and write up what's necessary in documentation. Similarly,

extern "C" {
  fn strdup(s: *const c_char) -> *mut c_char;
}

is much less informative than the (should eventually be equivalent):

extern "C" {
  fn strdup(s: &CStr) -> CBox<CStr>; // CBox means "using the C allocator"
}

@RalfJung
Copy link
Member

RalfJung commented Jun 24, 2017

@ubsan I would argue that the extern "C" function should have C-level types (raw pointers, no Box etc.), and then one writes a wrapper around it using more Rustic types. This avoids making assumptions about the layout of the Rust types.

But maybe that doesn't actually work so well, and anyway it's enough for some people to write libraries the way you propose to make this a problem.

@glaebhoerl
Copy link
Contributor

@eddyb Hmm, what if both pointers and references to such opted-in types were ZSTs, with a fixed address of 0xAlign? Would that solve the transmute problem, and would it introduce any others? It seems like it would avoid any kind of FFI issues unless people deliberately aim at their own feet.

@eddyb
Copy link
Member

eddyb commented Jun 24, 2017

ZSTs, with a fixed address of 0xAlign

What does that mean? A by-value coercion? Then this will still silently break code which doesn't use transmute.

@glaebhoerl
Copy link
Contributor

@eddyb Can you give an example of what kind of code you're thinking of? I just mean that both *const ZST and &ZST would themselves be ZSTs, and whenever the actual value of the address is required (e.g. *const ZST as usize), it would be 0x1 (alignment was a silly thought, given that ZSTs don't really have differing alignments).

Hmm... that would break ptr::null() though. And making the value always be 0x0 instead would probably break &T, which is supposed to be non-zero. So I guess this doesn't really work out...

@EpicatSupercell
Copy link
Author

EpicatSupercell commented Jun 24, 2017

@glaebhoerl We really shouldn't touch *const ZST, since it carries extra data (null or non-null) it can't be ZST itself.


there were about a dozen (IIRC) crates which transmuted between pointers to T: ?Sized.

@eddyb Pointers to T: ?Sized are not affected by this though? Since we only affect the references to structs of known size, references to trait objects are unaffected. References transmuted between each other will be affected only if the target is of different sizes... which IIRC would be an error anyway? And no sort of pointer would be affected. And I don't think people transmute between *const ZST and &ZST since they could just use as for that. Are there other cases I didn't think of?


@ubsan The specific examples you showed didn't use a ZST. For examples with ZST, the next comes to mind:

struct OpaqueStruct;

extern "C" {
  fn some_fn(s: &OpaqueStruct);
}

but I think it could be enough to allow writing the next

unsafe impl !Sized for OpaqueStruct {}

and give an error that hints of the fix, something like Can't use references to ZST in extern "C", consider adding "unsafe impl !Sized for OpaqueStruct {}".
Would this much be enough? The code will not silently break, and the fix is given + straightforward.

Note that writing the unsafe impl !Sized would probably disallow instantiating the struct on rust's side, but I can't even come up with a single case where you would send a pointer to a REAL ZST to C code.

@eddyb
Copy link
Member

eddyb commented Jun 24, 2017

@EpicatSupercell My point was that there were even transmutes between T: ?Sized pointers, so we had to support those, on top of T: Sized pointers (which have a known size).

@EpicatSupercell
Copy link
Author

EpicatSupercell commented Jun 24, 2017

@eddyb Is it valid to transmute between pointers to different known sizes? So transmute &u8 to &u16 for example.

Edit: Guess it is... Think it's gonna be a tough one. Is it invalid to transmute between u8 and u16 at least?

@eddyb
Copy link
Member

eddyb commented Jun 24, 2017

@EpicatSupercell It has always been valid. Transmute only cares about immediate size, which is the same for all pointers to sized types.

And I don't think people transmute between *const ZST and &ZST since they could just use as for that.

Strictly speaking, transmute was a mistake for almost everything it's commonly used for nowadays and I wish I hadn't seen some of the usage in the wild - the compiler should complain more.

Sadly rust-lang/rust#32939 doesn't seem to link to the original crater report, so I can't show you the examples for T: ?Sized that I was mentioning earlier.

@glaebhoerl
Copy link
Contributor

We really shouldn't touch *const ZST, since it carries extra data (null or non-null) it can't be ZST itself.

...Yes. That's what I said.

@EpicatSupercell
Copy link
Author

@eddyb Actually since &ZST and *const ZST ARE of different immediate size, the real question is whether we care about breaking the code that uses it like that. Honestly I'd hate to do that, but I feel like if we care TOO much about maintaining backwards compatibility, we'll end up worse than C++ in no time. If the use was never correct, AND it's in unsafe block, AND there are better solutions anyway, AND we give an error message instead of silently breaking, it might be fine to break it?

@le-jzr
Copy link

le-jzr commented Jul 24, 2017

There should probably at least be a better way to represent opaque types even if this particular optimization is never implemented.

#1861 is already being merged, thankfully.

@EpicatSupercell
Copy link
Author

EpicatSupercell commented Jul 24, 2017

With the approaching resolution of #1861 we get a tool for FFI to solve some concerns. The other concerns are about unspecified behavior. I am not in favor of breaking code, so right now this RFC is inapplicable.

However, with the additions of epochs, a new possibility opens for us - allowing only references from a newer epoch to be ZST, while keeping all current code working. This RFC sounds fitting for a test run of epochs (assuming it's accepted at the time in the first place). I'm not sure if now is a fitting time for it or not.

On a side note, due to personal circumstances, I didn't have the time to manage this RFC at all this past month :(

@RalfJung
Copy link
Member

RalfJung commented Jul 24, 2017

A requirement for epoch's is that "if your code compiles in the previous epoch without warnings, then it will keep working in the next epoch". There cannot be silent breakage. So we'd need to be able to reliably warn for any code that relies on the data layout of &ZST. That seems like a tough call...

@EpicatSupercell
Copy link
Author

It sounds possible to me. You make &ZST be Zero-Sized in the new epoch (= new code) and non Zero-Sized in the old epoch. When calling a function from old epoch or using a struct from old epoch you generate a value for the reference so it's no longer Zero-Sized, and when you receive a value from one you drop it so it's Zero-Sized again. Old code keeps working because it's never ZST, new code will not allow you to do illegal operations anyway.

@Manishearth
Copy link
Member

That's still not epoch-compatible -- you can't just "generate a value for the reference", the value has to still be correct if you don't want to break backwards compatibility. Remember, using &ZST for holding references to opaque C-managed types is a common, recommended pattern for Rust FFI. Generating fake values will not fix this situation, it will only make it compile.

And the struct thing won't work at all, we can't have different representations for the same struct across crates; it would not allow them to be shared.

Epoches allow for breaking changes that are not in the core language, as defined in the RFC. I'd argue that this is definitely a change to the core language since it observably changes representations.

@EpicatSupercell
Copy link
Author

EpicatSupercell commented Jul 25, 2017

different representations for the same struct

Different representation depending on where it is defined, not used. Therefore a use of an epoch 1 struct in epoch 2 code will follow conventions of epoch 1 struct, while checking the size of an epoch 2 struct containing &ZST will return size 0 even in epoch 1 code.

using &ZST for holding references to opaque C-managed types is a common, recommended pattern for Rust FFI

I come with the assumption that #1861 becomes the standard and & extern type becomes the recommended pattern. In that case &ZST stops being recommended and becomes an anti-pattern (and possibly warning/error).

@le-jzr
Copy link

le-jzr commented Jul 25, 2017

I come with the assumption that #1861 becomes the standard and & extern type becomes the recommended pattern. In that case &ZST stops being recommended and becomes an anti-pattern (and possibly warning/error).

The compiler would have to emit warnings wherever code would not work for ZST references. This includes FFI, casts to/from usize, various transmutes, raw pointers of ZST types, etc.

The problematic part, however, is that generic code couldn't do those things on any generic references at all, unless a new auto-trait for non-ZSTs is added first (or ZSTs were removed from the Sized bound, which again, wouldn't be backwards compatible).

@le-jzr
Copy link

le-jzr commented Jul 25, 2017

With respect to interaction between epochs, imagine a specific case where epoch 2 code uses an epoch 1 FFI library that, for whatever irrelevant reason, passes around opaque payload as type &(). Epoch 2 code cannot just handle that &() from epoch 1 as a zero-sized reference, because it would conflict with the requirement that epoch 1 crates can be used in epoch 2 projects.

So for one thing, epoch 2 would need to understand its own &() as a different type from &() in epoch 1 functions. It would also need a way to use epoch 1 &() values, and name that legacy type in function signatures etc.

In other words, it would be possible, but it would be no small feat.

@EpicatSupercell
Copy link
Author

EpicatSupercell commented Jul 25, 2017

Guess it could be possible in that case to create a type called Epoch_1::LegacyZST (which is equal to legacy &()) and have the compiler complain that you try to assign a value of type LegacyZST to a type (epoch 2)&(). Can't say it's very elegant, but is relatively simple.

My main reason I try and find a way to make this work is because I fear that if we decide a feature as simple (in my opinion) as this can't be used with Epochs, we will never be able to add any serious features without a major version (Rust 2.0). And with the collective desire to not go that route, I fear the language will stagnate even faster than C.

@rkjnsn
Copy link
Contributor

rkjnsn commented Jul 29, 2017

I have some thoughts. I'm afraid I have no conclusions. It seems to me there are two main ways to try to remove the overhead of references to ZSTs: make them ZSTs themselves (as proposed here), or try to eliminate them through optimization.

Making them ZSTs

The former option is a breaking change to the language. I agree that it could probably be accomplished in a linking-compatible way by using something like a pair of V1ZstRef<()> and V2ZstRef<()> types. I imagine both would be usable from code in any version, with the first being equivalent to the original &(), and the latter to the new zero-sized &().

The next question is, is it worth it? Even with a mechanism for making breaking changes to the language, the desire is still to keep them small and rare. We would also be committing to keeping both types of references in the compiler for the foreseeable future. Therefore, I would want to see concrete benefits demonstrated before going down that road, such a real work application where it saves a significant amount of memory or code size. I do agree that it seems reasonable to assume that using ZST references for FFI would become deprecated once we have a better alternative.

When I first came to Rust, I actually expected ZST references to also be ZSTs, and was surprised when they weren't. My preference would still be for them to be, I'm just not sure it's worth the breaking change. Aside from backward compatibility, I would like to understand better what generic patterns would be complicated, given that containers et al already have to special case ZSTs in most cases. Are there examples of code that currently works with ZSTs without special casing, but would have to deal with them specially after this change? Is there any code that would silently break?

Eliminating ZST references as an optimization

This can be done either conservatively and aggressively.

Conservatively

This would work by eliminating ZST references when the compiler can prove the address isn't observed, such as when a function never converts the reference to a raw pointer. This could be accomplished through metadata flags, and, like impl Trait, would mean that some changes to a function would require all users of it to also get recompiled. The easiest target for this optimization would be eliminating bare reference, such as &self, but with extended analysis, it would be possible to eliminate them from crate-private structs as well.

I imagine this approach would be able to avoid passing/storing ZST references here and there, but it wouldn't be reliable, and a change to a leaf function could force a ZST reference to be stored through most of the call graph.

Aggressively

I see this approach as kind of a hybrid. ZST references would take up no space in a lot more cases, but code could still treat ZST references as pointer-sized in generic code. To make this work, the language would specify that for any reference: &ZST, reference as *const ZST as usize == std::mem::align_of<ZST>. That is to say, all instances of a given ZST are specified to reside at the lowest non-zero address consistent with its alignment. The language would further specify that attempting to create/modify a ZST reference to point at a different location would be undefined behavior. Importantly, this undefined behavior can easily be tested for by a sanitizer, which is desired for any UB in Rust.

This would allow the compiler to always eliminate passing and returning a bare &ZST, and would also enable compiler to avoid storing them in repr(rust) structs. Structs would then just take up the space necessary for there other members. The exception (to avoid unexpected ZSTs in generic code), is that structs containing nothing except ZST references and ZSTs would act just like bare &ZSTs: they would act pointer sized, but the compiler could eliminate passing them or storing them.

This approach obviously requires that using &ZSTs in FFI and other usages of ZSTs and opaque types to be deprecated and eventually removed in favor of #1861. I think this is desirable anyway, because using ZSTs as opaque types have a number of other disadvantages (which, indeed, served as the motivation for that RFC). This approach also wouldn't eliminate space usage in all places that making &ZST truly zero-sized would. E.g., unless optimized to take advantage of the new rules, Vec<&ZST> would still store a pointer-sized value for each reference, even though they're guaranteed to be the same.

Finally, this could break some non-FFI code. The most obvious example I can think of is something like using https://github.com/Diggsey/rust-field-offset with a zero-sized field. In debug mode, calculating the offset would underflow and panic. In release mode, applying the offset to a given struct would result in a pointer to a random location. Now, dereferencing a random point to a ZST is unlikely to actually cause any trouble, since there will never be an actual read and write to that location. Further the address would effectively get normalized back to the proper one when apply returned, since the compiler would have optimized out actually returning anything.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 9, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Aug 9, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Aug 19, 2017

The final comment period is now complete.

@aturon
Copy link
Member

aturon commented Aug 23, 2017

Thanks all for the discussion, and @EpicatSupercell for the RFC! I'm going to close as per discussion and FCP, but there are still alternative avenues to explore here!

@kevincox
Copy link

kevincox commented Nov 6, 2017

E.g., unless optimized to take advantage of the new rules, Vec<&ZST> would still store a pointer-sized value for each reference

Can someone explain why this is? What is preventing &ZST from being considered a ZST itself in this context?

@SimonSapin
Copy link
Contributor

@kevincox &T has been “non-null *const T with a lifetime” forever and some programs rely on it carrying a memory address.

For example they might use ZST’s to represent opaque types that are only manipulated through pointers or references (say, C++ objects whose memory layout might change across platforms). Using references over pointers is useful to get lifetime tracking. Then when it’s time to do something with that reference, it is cast back to a pointer and passed to some FFI function and needs to have preserved its address.

@kevincox
Copy link

kevincox commented Nov 6, 2017

@SimonSapin That sounds like the general reason this RFC was denied, but I don't see how it translates to the Vec case with the "proposed" changes in that comment. That "proposal" would set reference as *const ZST as usize == std::mem::align_of<ZST> which would remove the ability of storing data in the reference.

@oli-obk oli-obk mentioned this pull request Apr 17, 2018
yvt added a commit to yvt/Stella2 that referenced this pull request May 6, 2019
- Saves one or more characters in parameter declarations.
- Reduces the memory consumption, run-time overhead, and code size
  because `WM` can be a zero-sized type while `&WM` cannot.
  (rust-lang/rfcs#2040
@vporton
Copy link

vporton commented Mar 13, 2023

It is indeed possible to do zero-sized references without breaking existing code:

struct S<'a> {
    #[allow_zero_sized]
    pub e: &'a Empty,
}

Then S should be zero-sized.

Please implement this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.