Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added representation of pointer types. #51

Merged
merged 10 commits into from
Dec 20, 2018

Conversation

asajeffrey
Copy link

Fixes #16

@asajeffrey
Copy link
Author

On thinking about it, now that the non-zero and alignment requirements are punted to validity, we're just left with size and alignment. Everything else can be part of the notes.

@asajeffrey
Copy link
Author

Do we need to add text about the ABI?

#[repr(C)]
struct DynObject {
data: &u8,
vtable: &usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this vtable: &usize ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To emphasize the fact that it's non-zero and word-aligned. Not part of the representation, but I believe we're going to be requiring this when we come to validity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a very confusing way to make that point, and since it's not even really part of what this write-up is supposed to be about at this point, nor particularly useful, I'd really prefer not doing that. Just *const () is good enough for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made them both &u8.

@asajeffrey
Copy link
Author

For slices, we could say something like the representation of &[T] is the same as that of:

#[repr(C)]
struct Slice {
  ptr: &[T;N],
  len: usize,
}

and then when we come to validity, say that N has got to be the value in the len field. This would cope with nasty cases, e.g. when T is uninhabited (and hence &T is uninhabited, but &[T] is inhabited).

@asajeffrey
Copy link
Author

I think I dealt with a ll the outstanding comments. Can we squash and merge?


### Representation

The alignment of reference and raw pointer types is the word size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this might not actually be true for custom DSTs (e.g., consider a custom DST with u64 metadata on a 32 bit target). Since the rest of this text is already careful to be forward compatible with custom DSTs, it would be a shame if this part wasn't.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, worth thinking about. On 32-bit targets, does u64 have 64-bit alignment? Are there architectures with alignment larger than the word size?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, u64 is not a good example, because on most (all?) 32 bit targets it has alignment 4. But there are definitely tons of types with super-word alignment -- anyone can define them with #[repr(align(N))], and vector types such as __m128 usually require natural alignment (i.e., size = align).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the question is do we want to tie our hands and say that metadata of custom DSTs is at most word-aligned? The trade-offs don't seem obvious to me, I'm trying to come up with an example and the best I can do is something like:

  let a: Box<T> = Box::new(T::new());
  let b: Box<(*T, usize, usize)> = Box::new(Box::Into_raw(a), 0, 0);
  let c: &&U = unsafe { mem::transmute(Box::into_raw(b)) };

The question is can this be insta-UB if Us metadata is super-word aligned?

This is the simplest example I can think of.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing metadata with higher alignment comes naturally, as pointers to custom DSTs are just (<untyped data pointer>, T::Metadata). Is there any reason to artifically prohibit it? All of the code I know of that can be generic over all DSTs doesn't have to care about alignment or can trivially be made compatible with higher alignments by strategically using mem::align_of::<&T> (and T has to be known anyway if you're manipulating pointers, because it dictates the size of the pointer too).

Copy link
Contributor

@gnzlbg gnzlbg Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents is that we shouldn't say anything about the size and alignment of references here.

We should just say that the layout of references (&T and &mut T) and pointers (*const T and *mut T) is the same as the layout of the following type:

struct Layout<Metadata> {
    ptr: *mut (), 
    metadata: Metadata,
}

We can then say that:

note: Metadata === X is to be read as Metadata has the same layout as X

  • if T: Sized, then Metadata === (),
  • if T is a slice, then Metadata === usize,
  • if T is a trait, then Metadata === *const (),
  • if T is a multi-trait object, then Metadata === <implementation-defined>,
  • if T is a custom DST, then Metadata === T::Metadata,
  • etc.

and the layout of pointers and references just follows from the struct representation rules, e.g., that they are at least one "word" wide follows from Layout always being at least one word wide.

Since I've used raw pointers, we have to say something about the size and alignment of raw pointers here, but we already said that their size is the same as usize, and for alignment we can just say that the raw pointer alignment is an implementation-defined multiple of it's the same as their size.

Copy link

@hanna-kruppe hanna-kruppe Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have custom DSTs in the language, then all pointers are ... { ptr: *mut (), metadata: T::Metadata }, as custom DSTs generalize slices, trait objects, and sized types (we can still spell this out for clarity but there's no need to have two things called Metadata or to introduce this === relation). But we don't have custom DSTs yet, so it feels very weird to define layout in terms of a concept the language doesn't have yet, even if we can technically avoid any reference to a specific non-existent DynamicallySized trait.

Since I've used raw pointers, we have to say something about the size and alignment of raw pointers here, but we already said that their size is the same as usize, and for alignment we can just say that the raw pointer alignment is an implementation-defined multiple of their size.

Wait, what? If the alignment is not described yet, it should be described if we can at all get consensus for it. As far as I know, that alignment should be exactly the size. But even if that is controversial, it's the other way around, size has to be a multiple of the alignment (because element i of an array is at byte offset i * sizeof(T)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't have custom DSTs yet, so it feels very weird to define layout in terms of a concept the language doesn't have yet, even if we can technically avoid any reference to a specific non-existent DynamicallySized trait.

Even without the concept of custom DSTs, we do have the concept of generic structs, and their layout, and those can be used to specify the layout of slices and trait objects.

Copy link

@hanna-kruppe hanna-kruppe Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but unifying these two layouts into one generic struct, and the naming of the type parameter, gestures heavily at the concept of custom DSTs (would anyone make this suggestion if they didn't know of the generalization that custom DSTs bring?), without really buying us anything right now. If anything, it makes the presentation slightly more complex.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I reworded this to say that reference types have at least word alignment, and have exactly word alignment in the case of references to sized types, &[T], &str or &dyn Trait.

@asajeffrey
Copy link
Author

OK, try again... are we okay with merging now?

```rust
#[repr(C)]
struct Slice<T> {
ptr: &T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if the length is zero, then what is ptr? Perhaps *const T is a better choice -- certainly it will fit the "validity invariants" better (eg, I don't think there is any requirement that it be "dereferenceable" in the event that length is zero). However, from_raw_parts does state:

data must be non-null and aligned, even for zero-length slices. One reason for this is that enum layout optimizations may rely on references (including slices of any length) being aligned and non-null to distinguish them from other data. You can obtain a pointer that is usable as data for zero-length slices using NonNull::dangling().

It seems worth linking to slice::from_raw_parts and reproducing some of that text (although arguably those are things best left for the next discussion?).

Copy link
Contributor

@gnzlbg gnzlbg Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't like the use of references here much [0] the guarantee here is that "the representation of &[T] is the same as of Slice<T>", - that is, the Slice<T> struct only applies to the representation / layout of &[T] and not to its validity.

Ideally, we would use the same examples to show validity and representation, but we haven't settled the validity of these yet (it might well be that we need different "examples" for the &[T] and the *[T] cases). I think it is worth it to wait until the validity of these is settled before trying to "unify" all the examples.


[0] Personally, I don't like the use of &T here much, it has too many connotations, e.g., w.r.t. validity and safety, and we don't really care about them when just talking about the representation. It might be worth it to just say:

The representation of &[T] is the same as that of:

struct Slice {
    ptr: *const (),
    len: usize,
}

AFAICT there is no need to make Slice<T> generic since T: Sized, but maybe there is something else going on here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said above, we could also use

struct Slice<T> {
  ptr: &[T;N],
  len: usize,
}

and in practice, N and len are the same.

All these defns are equivalent, it's just a case of which is the more human-readable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment that *T means *const T or *mut T when the mutability is unimportant, and used *T rather than &T. On zulip, @gnzlbg was okay with this, have we finally closed the last issue?

@asajeffrey
Copy link
Author

OK, asking yet again, can we squash and merge?

@RalfJung
Copy link
Member

No objections from my side.

@asajeffrey
Copy link
Author

@avadacatavra I think you can squash and merge.

@avadacatavra avadacatavra merged commit b37e8f6 into rust-lang:master Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants