Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite docs for std::ptr #49767

Merged
merged 9 commits into from
May 15, 2018
175 changes: 144 additions & 31 deletions src/libcore/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -959,59 +959,131 @@ extern "rust-intrinsic" {
/// value is not necessarily valid to be used to actually access memory.
pub fn arith_offset<T>(dst: *const T, offset: isize) -> *const T;

/// Copies `count * size_of<T>` bytes from `src` to `dst`. The source
/// and destination may *not* overlap.
/// Copies `count * size_of::<T>()` bytes from `src` to `dst`. The source
/// and destination must *not* overlap.
///
/// `copy_nonoverlapping` is semantically equivalent to C's `memcpy`.
/// For regions of memory which might overlap, use [`copy`] instead.
///
/// `copy_nonoverlapping` is semantically equivalent to C's [`memcpy`].
///
/// [`copy`]: ./fn.copy.html
/// [`memcpy`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memcpy
///
/// # Safety
///
/// Beyond requiring that the program must be allowed to access both regions
/// of memory, it is Undefined Behavior for source and destination to
/// overlap. Care must also be taken with the ownership of `src` and
/// `dst`. This method semantically moves the values of `src` into `dst`.
/// However it does not drop the contents of `dst`, or prevent the contents
/// of `src` from being dropped or used.
/// `copy_nonoverlapping` is unsafe because it dereferences a raw pointer.
/// The caller must ensure that `src` points to a valid sequence of type
/// `T`.
///
/// # Undefined Behavior
Copy link
Contributor

@ExpHP ExpHP Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the role of having both Safety and Undefined Behavior sections. It seems to me that the convention is to have a single section named Safety.

[ lampam @ 21:52:55 ] (master •58 -7 +7) ~/asd/clone/rust
$ rg '#* Undefined [bB]ehavio(u)?r'
src/libcore/mem.rs
532:/// # Undefined behavior

src/liballoc/raw_vec.rs
154:    /// # Undefined Behavior
171:    /// # Undefined Behavior

$ rg '# Safety'
(far more results)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(granted, I often wish the # Safety sections tended to more closely resemble your # Undefined Behavior sections, but that is just my opinion)

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveklabnik mentioned that new docs should make use of Undefined Behavior (I think upper case is correct). All these functions are trivially unsafe because they dereference raw pointers, and the Safety section as I have it doesn't add any new information. Perhaps it could be moved to the module docs and not repeated everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would favor a change like you just mentioned, but will also wait to see what Steve says. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I felt like these two cases were pretty distinct, but now, re-thinking about it, I'm not so sure, honestly. In some senses, UB is a subset of safety, and it feels like it's worth calling out separately, but maybe not. dunno.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused; I thought UB was the consequence of not following Safety. Is there a kind of unsafety that doesn't lead to UB?

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some background on my thought process:

While writing this, I took the point of view that Safety referred to the language feature (i.e. the subset of rust you must opt into with the unsafe keyword). Some functions (e.g. Vec::set_len) are not obviously unsafe from looking at their signature. The Safety section is probably a good place to discuss this.

In discussing why a function is unsafe, one must communicate what invariants must be maintained to avoid UB. Most of the standard library uses the Safety heading to communicate this. I am now of the opinion that there's never really a reason to have both of these top-level headings (as I do now) since the information is so closely related.

However, for functions whose unsafety is obvious because they take a raw pointer and have names like read, write, and copy, it seemed clearer to name the section discussing its invariants Undefined Behavior. RawVec::from_raw_parts also uses this convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm "safety" is "memory safety", there are kinds of UB that are not memory safety related.

///
/// Behavior is undefined if any of the following conditions are violated:
///
/// * The region of memory which begins at `src` and has a length of
/// `count * size_of::<T>()` bytes must be *both* valid and initialized.
Copy link
Contributor

@Gankra Gankra May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me what "valid" means (a single live allocation..?).

Also I don't understand why src must be initialized. Where did you get this from? This would make it impossible to implement realloc in a pure rust allocator (which must copy uninitialized memory, since it doesn't know what part of the buffer is initialized).

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"valid" was an attempt to communicate "allocated memory whose access doesn't violate LLVM's pointer aliasing rules" without overwhelming the reader. We could take this stuff out and define valid in the top-level docs (see next comment).

My only insight was that, according to the reference, undefined behavior includes "Reads of undef (uninitialized) memory".

I guess that as long as one doesn't assume that the value of *dest is defined after the call to copy, you wouldn't be propagating undef everywhere. I'm not sure how to properly communicate the constraints though.

///
/// * The region of memory which begins at `dst` and has a length of
/// `count * size_of::<T>()` bytes must be valid (but may or may not be
/// initialized).
///
/// * `src` must be properly aligned.
///
/// * `dst` must be properly aligned.
///
/// * The two regions of memory must *not* overlap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the top, since all the other conditions are just "things that are true of any pointer accesses" (which suggests to me that they can potentially be omitted or centralized in the top-level module documentation and just linked?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to document the requirements for stores and loads at the top-level. My concern was that people coming to the docs from a search engine may not see the top-level documentation. There was also talk in #36450 of specifying alignment requirements for every function in std::ptr.

I'd like some clarification on what the precise requirements for loads and stores are regarding initializing memory, and what the precise definition of "valid" memory is.

///
/// Additionally, if `T` is not [`Copy`](../marker/trait.Copy.html), only
/// the region at `src` *or* the region at `dst` can be used or dropped
/// after calling `copy_nonoverlapping`. `copy_nonoverlapping` creates
/// bitwise copies of `T`, regardless of whether `T: Copy`, which can result
/// in undefined behavior if both copies are used.
///
/// # Examples
///
/// A safe swap function:
/// Manually implement [`Vec::append`]:
///
/// ```
/// use std::mem;
/// use std::ptr;
///
/// # #[allow(dead_code)]
/// fn swap<T>(x: &mut T, y: &mut T) {
/// /// Moves all the elements of `src` into `dst`, leaving `src` empty.
/// fn append<T>(dst: &mut Vec<T>, src: &mut Vec<T>) {
/// let src_len = src.len();
/// let dst_len = dst.len();
///
/// // Ensure that `dst` has enough capacity to hold all of `src`.
/// dst.reserve(src_len);
///
/// unsafe {
/// // Give ourselves some scratch space to work with
/// let mut t: T = mem::uninitialized();
/// // The call to offset is always safe because `Vec` will never
/// // allocate more than `isize::MAX` bytes.
/// let dst = dst.as_mut_ptr().offset(dst_len as isize);
/// let src = src.as_ptr();
///
/// // The two regions cannot overlap becuase mutable references do
/// // not alias, and two different vectors cannot own the same
/// // memory.
/// ptr::copy_nonoverlapping(src, dst, src_len);
/// }
///
/// // Perform the swap, `&mut` pointers never alias
/// ptr::copy_nonoverlapping(x, &mut t, 1);
/// ptr::copy_nonoverlapping(y, x, 1);
/// ptr::copy_nonoverlapping(&t, y, 1);
/// unsafe {
/// // Truncate `src` without dropping its contents.
/// src.set_len(0);
///
/// // y and t now point to the same thing, but we need to completely forget `t`
/// // because it's no longer relevant.
/// mem::forget(t);
/// // Notify `dst` that it now holds the contents of `src`.
/// dst.set_len(dst_len + src_len);
/// }
/// }
///
/// let mut a = vec!['r'];
/// let mut b = vec!['u', 's', 't'];
///
/// append(&mut a, &mut b);
///
/// assert_eq!(a, &['r', 'u', 's', 't']);
/// assert!(b.is_empty());
/// ```
///
/// [`Vec::append`]: ../../std/vec/struct.Vec.html#method.append
#[stable(feature = "rust1", since = "1.0.0")]
pub fn copy_nonoverlapping<T>(src: *const T, dst: *mut T, count: usize);

/// Copies `count * size_of<T>` bytes from `src` to `dst`. The source
/// Copies `count * size_of::<T>()` bytes from `src` to `dst`. The source
/// and destination may overlap.
///
/// `copy` is semantically equivalent to C's `memmove`.
/// If the source and destination will *never* overlap,
/// [`copy_nonoverlapping`] can be used instead.
///
/// `copy` is semantically equivalent to C's [`memmove`].
///
/// [`copy_nonoverlapping`]: ./fn.copy_nonoverlapping.html
/// [`memmove`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memmove
///
/// # Safety
///
/// Care must be taken with the ownership of `src` and `dst`.
/// This method semantically moves the values of `src` into `dst`.
/// However it does not drop the contents of `dst`, or prevent the contents of `src`
/// from being dropped or used.
/// `copy` is unsafe because it dereferences a raw pointer. The caller must
/// ensure that `src` points to a valid sequence of type `T`.
///
/// # Undefined Behavior
///
/// Behavior is undefined if any of the following conditions are violated:
///
/// * The region of memory which begins at `src` and has a length of
/// `count * size_of::<T>()` bytes must be *both* valid and initialized.
///
/// * The region of memory which begins at `dst` and has a length of
/// `count * size_of::<T>()` bytes must be valid (but may or may not be
/// initialized).
///
/// * `src` must be properly aligned.
///
/// * `dst` must be properly aligned.
///
/// Additionally, if `T` is not [`Copy`], only the region at `src` *or* the
/// region at `dst` can be used or dropped after calling `copy`. `copy`
/// creates bitwise copies of `T`, regardless of whether `T: Copy`, which
/// can result in undefined behavior if both copies are used.
///
/// [`Copy`]: ../marker/trait.Copy.html
///
/// # Examples
///
Expand All @@ -1028,15 +1100,39 @@ extern "rust-intrinsic" {
/// dst
/// }
/// ```
///
#[stable(feature = "rust1", since = "1.0.0")]
pub fn copy<T>(src: *const T, dst: *mut T, count: usize);

/// Invokes memset on the specified pointer, setting `count * size_of::<T>()`
/// bytes of memory starting at `dst` to `val`.
/// Sets `count * size_of::<T>()` bytes of memory starting at `dst` to
/// `val`.
///
/// `write_bytes` is semantically equivalent to C's [`memset`].
///
/// [`memset`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memset
///
/// # Safety
///
/// `write_bytes` is unsafe because it dereferences a raw pointer. The
/// caller must ensure that the poiinter points to a valid value of type `T`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "poiinter"

///
/// # Undefined Behavior
///
/// Behavior is undefined if any of the following conditions are violated:
///
/// * The region of memory which begins at `dst` and has a length of
/// `count` bytes must be valid.
///
/// * `dst` must be properly aligned.
///
/// Additionally, the caller must ensure that writing `count` bytes to the
/// given region of memory results in a valid value of `T`. Creating an
/// invalid value of `T` can result in undefined behavior. An example is
/// provided below.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use std::ptr;
///
Expand All @@ -1047,6 +1143,23 @@ extern "rust-intrinsic" {
/// }
/// assert_eq!(vec, [b'a', b'a', 0, 0]);
/// ```
///
/// Creating an invalid value:
///
/// ```no_run
/// use std::{mem, ptr};
///
/// let mut v = Box::new(0i32);
///
/// unsafe {
/// // Leaks the previously held value by overwriting the `Box<T>` with
/// // a null pointer.
/// ptr::write_bytes(&mut v, 0, mem::size_of::<Box<i32>>());
/// }
///
/// // At this point, using or dropping `v` results in undefined behavior.
/// // v = Box::new(0i32); // ERROR
/// ```
#[stable(feature = "rust1", since = "1.0.0")]
pub fn write_bytes<T>(dst: *mut T, val: u8, count: usize);

Expand Down
Loading