Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added representation of pointer types. #51

Merged
merged 10 commits into from
Dec 20, 2018
43 changes: 43 additions & 0 deletions reference/src/representation/pointers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Representation of reference and pointer types

### Terminology

Reference types are types of the form `&T` or `&mut T`.

Raw pointer types are types of the form `*const T` or `*mut T`.

### Representation

The alignment of reference and raw pointer types is the word size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this might not actually be true for custom DSTs (e.g., consider a custom DST with u64 metadata on a 32 bit target). Since the rest of this text is already careful to be forward compatible with custom DSTs, it would be a shame if this part wasn't.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, worth thinking about. On 32-bit targets, does u64 have 64-bit alignment? Are there architectures with alignment larger than the word size?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, u64 is not a good example, because on most (all?) 32 bit targets it has alignment 4. But there are definitely tons of types with super-word alignment -- anyone can define them with #[repr(align(N))], and vector types such as __m128 usually require natural alignment (i.e., size = align).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the question is do we want to tie our hands and say that metadata of custom DSTs is at most word-aligned? The trade-offs don't seem obvious to me, I'm trying to come up with an example and the best I can do is something like:

  let a: Box<T> = Box::new(T::new());
  let b: Box<(*T, usize, usize)> = Box::new(Box::Into_raw(a), 0, 0);
  let c: &&U = unsafe { mem::transmute(Box::into_raw(b)) };

The question is can this be insta-UB if Us metadata is super-word aligned?

This is the simplest example I can think of.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing metadata with higher alignment comes naturally, as pointers to custom DSTs are just (<untyped data pointer>, T::Metadata). Is there any reason to artifically prohibit it? All of the code I know of that can be generic over all DSTs doesn't have to care about alignment or can trivially be made compatible with higher alignments by strategically using mem::align_of::<&T> (and T has to be known anyway if you're manipulating pointers, because it dictates the size of the pointer too).

Copy link
Contributor

@gnzlbg gnzlbg Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents is that we shouldn't say anything about the size and alignment of references here.

We should just say that the layout of references (&T and &mut T) and pointers (*const T and *mut T) is the same as the layout of the following type:

struct Layout<Metadata> {
    ptr: *mut (), 
    metadata: Metadata,
}

We can then say that:

note: Metadata === X is to be read as Metadata has the same layout as X

  • if T: Sized, then Metadata === (),
  • if T is a slice, then Metadata === usize,
  • if T is a trait, then Metadata === *const (),
  • if T is a multi-trait object, then Metadata === <implementation-defined>,
  • if T is a custom DST, then Metadata === T::Metadata,
  • etc.

and the layout of pointers and references just follows from the struct representation rules, e.g., that they are at least one "word" wide follows from Layout always being at least one word wide.

Since I've used raw pointers, we have to say something about the size and alignment of raw pointers here, but we already said that their size is the same as usize, and for alignment we can just say that the raw pointer alignment is an implementation-defined multiple of it's the same as their size.

Copy link

@hanna-kruppe hanna-kruppe Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have custom DSTs in the language, then all pointers are ... { ptr: *mut (), metadata: T::Metadata }, as custom DSTs generalize slices, trait objects, and sized types (we can still spell this out for clarity but there's no need to have two things called Metadata or to introduce this === relation). But we don't have custom DSTs yet, so it feels very weird to define layout in terms of a concept the language doesn't have yet, even if we can technically avoid any reference to a specific non-existent DynamicallySized trait.

Since I've used raw pointers, we have to say something about the size and alignment of raw pointers here, but we already said that their size is the same as usize, and for alignment we can just say that the raw pointer alignment is an implementation-defined multiple of their size.

Wait, what? If the alignment is not described yet, it should be described if we can at all get consensus for it. As far as I know, that alignment should be exactly the size. But even if that is controversial, it's the other way around, size has to be a multiple of the alignment (because element i of an array is at byte offset i * sizeof(T)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't have custom DSTs yet, so it feels very weird to define layout in terms of a concept the language doesn't have yet, even if we can technically avoid any reference to a specific non-existent DynamicallySized trait.

Even without the concept of custom DSTs, we do have the concept of generic structs, and their layout, and those can be used to specify the layout of slices and trait objects.

Copy link

@hanna-kruppe hanna-kruppe Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but unifying these two layouts into one generic struct, and the naming of the type parameter, gestures heavily at the concept of custom DSTs (would anyone make this suggestion if they didn't know of the generalization that custom DSTs bring?), without really buying us anything right now. If anything, it makes the presentation slightly more complex.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I reworded this to say that reference types have at least word alignment, and have exactly word alignment in the case of references to sized types, &[T], &str or &dyn Trait.


The sizes of `&T`, `&mut T`, `*const T` and `*mut T` are the same,
and are at least one word.

* If `T` is a trait, then the size of `&T` is two words.
gnzlbg marked this conversation as resolved.
Show resolved Hide resolved
asajeffrey marked this conversation as resolved.
Show resolved Hide resolved
* If `T` is a sized type then the size of `&T` is one word.
* The size of `&[T]` is two words.

Note that we do not make any guarantees about the sizes of
multi-trait objects `&(dyn T + U)`or references to dynamically sized types.
gnzlbg marked this conversation as resolved.
Show resolved Hide resolved

### Notes

The representations of `&T` and `&mut T` are the same.

The representation of `&T` when `T` is a trait is the same as that of:
```rust
#[repr(C)]
struct DynObject {
data: &u8,
vtable: &u8,
asajeffrey marked this conversation as resolved.
Show resolved Hide resolved
}
```

The representation of `&[T]` is the same as that of:
```rust
#[repr(C)]
struct Slice<T> {
ptr: &T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if the length is zero, then what is ptr? Perhaps *const T is a better choice -- certainly it will fit the "validity invariants" better (eg, I don't think there is any requirement that it be "dereferenceable" in the event that length is zero). However, from_raw_parts does state:

data must be non-null and aligned, even for zero-length slices. One reason for this is that enum layout optimizations may rely on references (including slices of any length) being aligned and non-null to distinguish them from other data. You can obtain a pointer that is usable as data for zero-length slices using NonNull::dangling().

It seems worth linking to slice::from_raw_parts and reproducing some of that text (although arguably those are things best left for the next discussion?).

Copy link
Contributor

@gnzlbg gnzlbg Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't like the use of references here much [0] the guarantee here is that "the representation of &[T] is the same as of Slice<T>", - that is, the Slice<T> struct only applies to the representation / layout of &[T] and not to its validity.

Ideally, we would use the same examples to show validity and representation, but we haven't settled the validity of these yet (it might well be that we need different "examples" for the &[T] and the *[T] cases). I think it is worth it to wait until the validity of these is settled before trying to "unify" all the examples.


[0] Personally, I don't like the use of &T here much, it has too many connotations, e.g., w.r.t. validity and safety, and we don't really care about them when just talking about the representation. It might be worth it to just say:

The representation of &[T] is the same as that of:

struct Slice {
    ptr: *const (),
    len: usize,
}

AFAICT there is no need to make Slice<T> generic since T: Sized, but maybe there is something else going on here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said above, we could also use

struct Slice<T> {
  ptr: &[T;N],
  len: usize,
}

and in practice, N and len are the same.

All these defns are equivalent, it's just a case of which is the more human-readable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment that *T means *const T or *mut T when the mutability is unimportant, and used *T rather than &T. On zulip, @gnzlbg was okay with this, have we finally closed the last issue?

len: usize,
}
```