Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validity of references: Bit-related properties #76

Closed
RalfJung opened this issue Jan 10, 2019 · 17 comments
Closed

Validity of references: Bit-related properties #76

RalfJung opened this issue Jan 10, 2019 · 17 comments
Labels
A-validity Topic: Related to validity invariants

Comments

@RalfJung
Copy link
Member

Discussing the "bit-pattern validity" of references: the part that can be defined without referring to memory.

Certainly, references are non-NULL. Following the current lowering to LLVM, they also must be aligned. This is in conflict with creating references to fields of packed structs, see RFC 2582 for a proposed solution.

Do we want to allow uninitialized bits? Theoretically we could allow something like 0xUU000001 (on 32bit, where U represents 4 uninitialized bits) for &(), but there seems to be little point in doing so.

@RalfJung RalfJung added active discussion topic A-validity Topic: Related to validity invariants labels Jan 10, 2019
@RalfJung
Copy link
Member Author

What does "aligned" mean for unsized types, where we might have to use the metadata to actually determine the alignment? For unsized types, the "actual" alignment isn't just a bit-level property. But we still have a "least required alignment" (whatever layout.align says for such types), and we could just use that?

For, &(u32, dyn Trait) would have to be 4-aligned if we just look at bit-related properties, even though the actual type implementing Trait could make higher requirements.

@nikomatsakis
Copy link
Contributor

Certainly, references are non-NULL. Following the current lowering to LLVM, they also must be aligned.

Yes.

Do we want to allow uninitialized bits?

Absent a compelling reason to do so, I think we should forbid them for now.

@RalfJung
Copy link
Member Author

Also see prior discussion at rust-lang/rust-memory-model#10 and rust-lang/rust-memory-model#12.

@the8472
Copy link
Member

the8472 commented May 31, 2020

Summary from #77 (comment) and previous comments:

On some platforms (e.g. x86_64-unknown-linux-gnu) virtual address space available to userspace is limited such that several of the high bits will always be zero. Where exactly that limit is is a bit fuzzy due to advancements in processor technology and kernel features (sysv ABI and kernel documentation say 47bits for backwards compatibility, anything higher is opt-in), but saying that the upper half belongs to the kernel should be fairly conservative and forwards-compatible. So applying a platform-specific restriction to the bit patterns of references could make an additional 2^63 niche values available.
These niches would be much easier to use compared to alignment-based ones since they do not depend on the pointee type.

The only open question here is if this could also be applied to &ZST since they're not subject to virtual address space limitations, i.e. if there's anything safe today that would generate references to ZSTs that are > isize::MAX. #102 seems relevant.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 1, 2020

The only open question here is if this could also be applied to &ZST since they're not subject to virtual address space limitations, i.e. if there's anything safe today that would generate references to ZSTs that are > isize::MAX.

I am not sure about "anything safe", but as of right now, the following code is certainly sound, so your proposal would be a breaking change:

fn mk_unit() -> &'static () { unsafe {
  &*(usize::MAX as *const ())
} }

@the8472
Copy link
Member

the8472 commented Jun 1, 2020

Yeah, the question is whether that would be acceptable. If not then the optimization can only be applied to non-ZST references, which loses some of the simplicity, but it's still more powerful than the alignment niches since it would also apply to align(1) types.

@chorman0773
Copy link
Contributor

Reguarding uninitialized references. I've mentioned lccc on issues here before, so I'll skip the large details.The two things that apply here, if you create a value with an uninitialized byte, the entire thing is uninitialized and types with validity requirements or pointers with non-trivial validity attributes cause uninit to become invalid (which is UB to even create) on reads and writes. Because references have validity requirements (both trivial and non-trivial), they aren't allowed to be uninitialized ever on read or write at the very least. For these reasons, unless there is a compelling reason to allow uninitialized values or bits, I would strongly oppose allowing uninitialized values (including partially uninitialized values) of reference types. This is, of course, particularily the case because of the non-trivial validity requirements (dereferenceable/dereference_write, readonly, and noalias, in the case of references), so a trivially valid representation may not be a valid one (and there is no way to prevent it, because the representation of any completely valid reference also represents a non-trivially valid reference of the same type).

@RalfJung
Copy link
Member Author

RalfJung commented Nov 2, 2020

Yeah, I don't think anyone would suggest that uninit memory should be valid for references -- as you noted, that is in conflict with the requirement of being non-null and well-aligned. (I don't know what "non-trivially valid" etc mean, but I think I am getting the gist of what you are saying.)

Note that even for types where all initialized bit patterns are valid, there are good reason not to make uninitialized memory valid. I view uninitialized memory as a separate possible value memory can have, so the decision whether it is valid ought to be made separately (with the one restriction that if uninit is valid, then everything should be valid).

@chorman0773
Copy link
Contributor

Non-trivially valid refers to attributes like dereferenceable. Things that cannot be proven by a bit-level inspection of the type, and therefore are not trivial to prove. Likewise trivially valid means bitwise valid. This also extends to uninit bytes, as I said, lccc takes an all or nothing approach to uninit (but see C for an exception for non-scalar types).

In the case of the high bytes, I think it may reasonable to have an implementation defined region of high bytes or bits that must be 0. On 65816, I have imposed a requirement that all pointers have a 0 in the most significant byte (previously the byte was padding and thus indeterminate) for compatibility with extensions that extend the effective address space to 32-bit from 24-bit (using, for example, bank switching). Allowing the invention of large pointers like that would be future incompatible with such extensions as it would cause a breaking change (suddenly accessing these pointers causes a bank switch). By technicality this would extend to ZST references because they are pointers, according to the ABI. (Though this wouldn't necessarily cause an issue if usize is allowed to be 24-bit 0 extended to 32-bit)

@RalfJung

This comment has been minimized.

@chorman0773

This comment has been minimized.

@RalfJung

This comment has been minimized.

@chorman0773

This comment has been minimized.

@RalfJung

This comment has been minimized.

@chorman0773

This comment has been minimized.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 7, 2020

Opened a new issue: #255

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2023

We have consensus on most of this issue (references are aligned and non-null), the other parts are tracked in:

@RalfJung RalfJung closed this as completed Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-validity Topic: Related to validity invariants
Projects
None yet
Development

No branches or pull requests

4 participants