Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a standard trait for getting many &mut to places #3100

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Nokel81
Copy link
Contributor

@Nokel81 Nokel81 commented Mar 31, 2021

@scottmcm
Copy link
Member

I think it would be helpful for the RFC to include an example of implementing this for a type which is indexed by user-defined types, not just comparatively-simple case of arrays, since we trust usize. Would be a good way to talk about soundness implications of the unsafe code in the presence of things like BTreeMap when the user-provided Ord impl is safe-but-misbehaving.

@scottmcm scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Mar 31, 2021
@adlerd
Copy link

adlerd commented Mar 31, 2021

Is there precedent for GetMutRefs to have an associated type just to support a default implementation? It seems like a different approach should be used here: in particular, I would suggest an unsafe "raw" trait, a safe "user" trait, and an impl<T> GetMutRefs for T where T: GetMutRefsRaw. Then either trait could be chosen to be implemented, and the safe trait would always be used as a bound. (The reason for using this approach as opposed to turning the safe trait into a sealed extension trait is to continue to permit the "safe" default implementation to be replaced in cases where it makes sense to do so.)

Also, is it realistic that there would be an impl GetMutRefs for [T; LENGTH] as opposed to impl GetMutRefs for [T], as the latter is strictly more general? If the array impl is indeed unrealistic then it should not be used as the motivating example in the RFC. (Note that this is separate from the question of the type passed to get_mut_refs, which should indeed be an array.)

Finally, is there sufficient motivation to put this in the standard library instead of a crate? I believe the rust features which started in a crate nursery generally end up with a better design and implementation than otherwise, although this is not a universally held opinion.

@burdges
Copy link

burdges commented Apr 1, 2021

If used without unsafe on slices, then isn't this less flexible than using split_at_mut in a loop or tree-like recursion? I suppose multi_split_at_mut! macro sounds handy. If used with unsafe on slices, then this seems redundant, no?

I agree we should explore adding arrayref-like functionality to core, meaning convenient conversion from slices or arrays into multiple arrays and slices, but with an interface resembling arrayref or slice indexing probably.

Index/IndexMut<ConstRange> where rustc builds pub struct ConstRange<const LOWER: ops::Bound, const UPPER: ops::Bound>; from .. literals maybe?

text/0000-get-mut-refs.md Outdated Show resolved Hide resolved
text/0000-get-mut-refs.md Outdated Show resolved Hide resolved
@programmerjake
Copy link
Member

seems to me that Value should be an associated type since it's uniquely determined by Key and Self -- like Index::Output

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 3, 2021

@burdges I don't quite understand what you mean by "If used without unsafe on slices, then isn't this less flexible than using split_at_mut in a loop or tree-like recursion?".

This isn't a way to get multiple &mut [T] out of a single &mut [T] but to get multiple &mut T out of a single &mut [T] (or any other collection type).

I guess an alternative could be to have this be an operator as well, ie IndexMutMany, where it de-sugars from let [x1, x2] = &mut source[index_1, index_2]; to let [x1, x2] = source.get_mut_many([index_1, index_2]).expect("non-overlapping"); or something, but I see that as an extension in the future. Though GetMutMany might be a better name.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 3, 2021

@adlerd Can you give an example as to how those traits might look? I am having trouble seeing how you would be able to use an unsafe "raw" trait. Even with the current proposal you could just make RawEntry = () if you wanted to implement the safe trait by itself for some reason.

I can change what the example implementation is, though I don't see how using [T; N] is any less motivating than [T].

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 3, 2021

Finally, is there sufficient motivation to put this in the standard library instead of a crate? I believe the rust features which started in a crate nursery generally end up with a better design and implementation than otherwise, although this is not a universally held opinion.

I think that there is. For instance, this sort of construct needs specific knowledge of the inner workings of types. So the trait impl's really need to be in the same "crate" as the types. Meaning that having a 3rd party crate wouldn't really work very well since the for types (like BTreeMap) their inner workings are completely private.

@burdges
Copy link

burdges commented Apr 3, 2021

There is a handy interface for this type of thing in https://github.com/burdges/lake/blob/master/Xolotl/src/sphinx/slice.rs that avoids annoying the borrow checkers:

impl<'heap,T> &'heap [T] {
    /// Returns an initial segment of a `mut &[T]` replacing the inner
    /// `&[T]` with the remainder.  In effect, this executes the command
    /// `(ret,heap) = heap.split_at(len)` without annoying the borrow
    /// checker.
    pub fn reserve(&mut self, len: usize) -> &'heap [T] {
        let tmp: &'heap [T] = ::core::mem::replace(&mut *self, &[]);
        let (reserved, tmp) = tmp.split_at(len);
        *self = tmp;
        reserved
    }
}

/// A version of `reserve` for fixed length arrays.
macro_rules! reserve_fixed { ($heap:expr, $len:expr) => {
    array_ref![$heap.reserve($len),0,$len]
} }

impl<'heap,T> &'heap mut [T] {
    /// Returns an initial segment of a `mut &mut [T]` replacing the inner
    /// `&mut [T]` with the remainder.  In effect, this executes the command
    /// `(ret,heap) = heap.split_at_mut(len)` without annoying the borrow
    /// checker.
    pub fn reserve_mut(&mut self, len: usize) -> &'heap mut [T] {
        let tmp: &'heap mut [T] = ::core::mem::replace(&mut *self, &mut []);
        let (reserved, tmp) = tmp.split_at_mut(len);
        *self = tmp;
        reserved
    }
}

/// A version of `reserve_mut` for fixed length arrays.
macro_rules! reserve_fixed_mut { ($heap:expr, $len:expr) => {
    array_mut_ref![$heap.reserve_mut($len),0,$len]
} }

We then implement an iterator for your interface using using Interator::scan and reserve_mut like

impl<T> [T] {
    /// Iterator yielding `&mut T` for prescribed elements 
    fn get_muts<'a>(&'a mut self, offsets: impl IntoIterator<Item=usize>)
     -> impl Iterator<Item = &'a mut T>+FusedIterator
    {
        let mut offset = None;
        let mut newoff = |o| {
            if let Some(offset) = offset {
                assert!(o > offset);
                let n = o-offset;
                offset = Some(o);
                n
            } else {
                offset = Some(o);
                o
            }
        };
        offsets.into_iter().scan(
            &mut self, 
            |selfy,o| selfy.reserve_mut(newoff(o)).first_mut()
        ).fuse()
    }
}

As with iter_mut, ourget_muts borrows the slice only once and returns references with that lifetime, but it iterates over the slice using efficient skips. Remove newoff if you want relative offsets.

We'll gain impl<T,const N: usize> FromIterator<T> for [T; N] from min_const_generics so then this works:

let [a,b,c] = slice.get_muts([7,13,17]).collect::<[&mut T; 3]>();

And some macro could save you specifying the [&mut T; 3] type here.

I'd expect newoff, etc. compiles down into nothing in release builds here, but if not then that's kinda a bug one should address. I donno..

There is some case for all these tools to live in core, including reserve_mut, get_muts, some cleaner macro, and even array_ref or whatever replaces it with min_const_generics, and then its reserve_fixed_mut. They're not as fundamental as split_at_mut or mem::replace, but all seem pretty competitive with say mem::take.

tl;dr We do not require new traits to handle slice lifetimes properly, but yes nicer inherent methods would make this more convenient. It's specifically reserve_mut's dance between replace and split_at_mut that you require here.

@adlerd
Copy link

adlerd commented Apr 3, 2021

@adlerd Can you give an example as to how those traits might look? I am having trouble seeing how you would be able to use an unsafe "raw" trait. Even with the current proposal you could just make RawEntry = () if you wanted to implement the safe trait by itself for some reason.

Here's what I was thinking to remove RawEntry from GetMutRefs; it also gives a little more flexibility in RawEntry impl I think by providing a reference or a pointer back to the container:

pub unsafe trait GetMutRefsRaw<Key, Value> {
    type RawEntry;

    fn would_overlap(&self, e1: &Self::RawEntry, e2: &Self::RawEntry) -> bool;

    // NB. we provide *mut Self instead of &'_ self because we can't soundly
    // hold a shared reference to self and a unique reference to some values at
    // the same time, even though borrowck won't notice the relationship.
    unsafe fn entry_to_ref<'a>(this: *mut Self, entry: Self::RawEntry) -> &'a mut Value;

    fn get_entry<'a>(&mut self, key: Key) -> Option<Self::RawEntry>;
}

pub trait GetMutRefs<Key, Value> {
    fn get_mut_refs<'a, const N: usize>(&'a mut self, keys: [Key; N])
        -> Option<[&'a mut Value; N]>;
}

impl<T, Key, Value> GetMutRefs<Key, Value> for T
where
    T: GetMutRefsRaw<Key, Value>,
{ /* ... */ }

This is extracted from a proposed implementation at this gist. NB this isn't meant to be suitable for RFC reference implementation; it's more like a proposed crate implementation.

@burdges
Copy link

burdges commented Apr 3, 2021

reserve and reserve_mut were designed for parsing binary data without allocations, btw.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 3, 2021

@burdges Thanks for the explanation. Though I don't see how it generalizes to non-slice types (like BTreeMap for instance). I also don't think that the requirement that all the indexes are monotonic in the input either makes sense from an API perspective or from a usability perspective.

Finally, I don't see how it solves the general question for user defined types since you cannot rely on Eq or PartialEq being correct for a general unsafe impl.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 3, 2021

@adlerd Thank you.

I am not convinced that one should need a reference back to the container for overlap checking. Furthermore, in the general case I am pretty sure that a RawEntry should be able to compute the reference itself without the need of a pointer to the collection being passed in. Take BTreeMap for instance, (which I am in the process of adding to the RFC doc) the "natural" RawEntry type, imo, would be Handle<NodeRef<marker::Owned, K, V, marker::LeafOrInternal>> which already has an into_val_mut() function. As containers get more complex the pointer to the root of the collection would probably become less and less useful. And implementers might be lead astray by that argument being there.

Finally, to the root of your suggestion. Now that I see what you meant I think that I like it better. Though I don't see how the argument about trait bounds applies since there would only really be one trait that you would want to make a bound on in either case.

edit: working through updating the RFC I know see why those functions need the receivers (because they are trait functions).

@burdges
Copy link

burdges commented Apr 3, 2021

I'd expect any safe interface you provide for slices reduces to being monotonic. And split_at_mut with perhaps a replace trick works for access patterns in which sequencing happens differently, like a recursion maintaining the same lifetime.

There is no split_at_mut analog for BTreeMap currently, but again inherent methods exposing this functionality would prove more universal than the interface proposed here.

impl<K, V> BTreeMap<K, V> {
    fn as_mut_view(&mut self) -> BTreeMapMutView<'_,K,V>
}
pub struct BTreeMapMutView<'a,K,V> { .. }
impl<'a,K, V> BTreeMapMutView<'a,K, V> {
    fn expand_mut_view(self) -> &'a [BTreeMapMutView<'a,K,V>]
}

Finally, I don't see how it solves the general question for user defined types

Rust's std never provided traits for user defined types previously. I'd think new traits set a higher bar than inherent methods.

I do not rule out that perhaps these traits reach this bar, but I've serious doubts given the long struggle over even the entry interface for hashmap, in #1769 and https://internals.rust-lang.org/t/pre-rfc-abandonning-morals-in-the-name-of-performance-the-raw-entry-api/7043 I suppose the entries interface has a harder goal of modifying keys, while you only permit modifying values here, which sounds fair if limited.

Interestingly, cuckoo tables admit an entry interface with local indexing similar to or better than slices, meaning it admits modifying keys while indexes remain tied to particular cuckoo tables, either via a runtime equality test of randomness, or else via a compile-time GhostCell-like invariant lifetime trick. In short, this interface is looks pointless for cuckoo table.

In this vein, we might explore GhostCell based or inspired entry-like tooling too, likely some fn frozen(&mut self) -> FrozenHashMap<'a,K,V> which permits modifying values but not keys.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 3, 2021

I'd expect any safe interface you provide for slices reduces to being monotonic.

Why though, my first example within the RFC doc is a clear example as to why you wouldn't want something like that.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 4, 2021

@scottmcm and @adlerd care to take another look?

@burdges
Copy link

burdges commented Apr 4, 2021

I meant safety reduces to the way slices store data. Yes, the newoff closure I wrote turns out rather simple. Yes, it's plausible optimizer might encounter trouble with more branches that swap sides, not sure.

There is a larger issue however, split_at_mut enables safe accesses to arbitrarily places in programmatic ways that avoids allocation, perhaps passing multiple subslices in recursive ways, or storing them in temporaries, while get_mut_many requires a one-off array construction, which is far more limiting

You prohibit slices in get_mut_many presumably because allocation kills any efficiency gain, but [usize]::sort_unstable only requires O(n log n) time and your n stays very small without VLAs.

In short, it's unclear whether this interfaces provides any speedup for slices, but it's clear this interfaces avoids teaching users about the much more powerful split_at_mut interface.


As written, these traits imply an entry-like interface, but if I understand they can only deliver a mutable value interface for slices, BTreeMap, HashMap, etc., meaning no conditional insertions or deletions. I'd suggest reserving the term entry for interfaces with entry-like conditional insertion and deletion, ala #1769 and https://internals.rust-lang.org/t/pre-rfc-abandonning-morals-in-the-name-of-performance-the-raw-entry-api/7043

I think BTreeMapMutView could provide a restricted entry-like interface, making it strictly more powerful than the interface provided here. I think some non-growable tree-like structures could provide a restricted entry-like behavior with the interface provided here, maybe even simple heaps.

I'm no stickler for traits having a rigorously consistent meaning, but traits do benefit from some consistency in their meaning, especially traits in std. I'm saying it's really unclear how much consistency of meaning this trait interface provides.

@Kimundi
Copy link
Member

Kimundi commented Apr 4, 2021

It seems like I'm currently proposing something similar: rust-lang/rust#83608

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 4, 2021

@Kimundi Yes it does seem that you have. I have been working on this for over a month now (on IRLO and Zulip) so any feedback would be appreciated.

I think that a more general solution than just for slices would be best.

Finally, you don't need non-mut versions of these since you can already get multiple &T out of a single &[T]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a1ad5ccc54a1634492a7b389c9c69799

@jan-hudec
Copy link

Shouldn't the GetMutMany trait be also unsafe? The GetMutMany.get_mut_many method is responsible for checking the places don't overlap, otherwise safety invariants are violated, so implementing the trait (because someone might still implement it in other way than using the GetMutManyRaw) should be unsafe, no?

@Nokel81
Copy link
Contributor Author

Nokel81 commented Apr 9, 2021

@jan-hudec That I am unsure about, my reaction after reading the book on unsafe traits would imply that no it doesn't need to be. For the same reason that a function that uses unsafe doesn't necessarily need to be declared as unsafe.

@adlerd
Copy link

adlerd commented Apr 9, 2021

@jan-hudec The rule for unsafe trait is "could an implementation of this trait which used no unsafe blocks cause UB? If so, then the trait must be unsafe". The way the GetMutMany trait is written currently, this doesn't hold, because a safe impl of get_mut_many cannot cause UB: borrowck prevents it already. In other words, for a safe impl, the impl itself is not responsible for checking the pieces don't overlap, borrowck is (or other existing unsafe like split_at_mut - which is outside the scope of this trait).


@Nokel81 I don't think the functions actually need receivers per se; I put them in just as a little something extra I thought I would improve the trait. I agree that in most cases they won't be needed, but it seems counterproductive to leave them out when doing so provides no benefit other than being potentially a tiny bit misleading, and it's possible they may be useful. In a situation where they are actually useful, the only alternative would be to embed a *mut Container in each handle. Probably the receiver in would_overlap is much more useful than the other one, since the only thing entry_to_ref should conceivably need is an element pointer. After all, entry_to_ref is just there to attach the borrowck imprimatur to that pointer and nothing more, so if there's a question of being misleading, the *mut self could be removed.

That said, I think the doc comment on entry_to_ref could use a bit of work. In order for this API to be used soundly, it should be phrased in terms of, "Here are the exact conditions under which the get_mut_many implementation will call this function*; the trait implementation must not cause UB under those conditions. A likely source of UB is &mut overlaps from different calls to this function. Another possible source of UB is if RawEntry contains & to the container." *And these conditions are: Whenever entry_to_ref<'output> is called with a RawEntry returned from a call to get_raw_entry<'input>, then there exists a &'origin mut Self from which both 'input and 'output are derived; and moreover, no other references deriving from 'origin (until the expiration of 'output) will be created or used except via calls to get_raw_entry for nonoverlapping RawEntry.

@alice-i-cecile
Copy link

Bevy (an ECS game engine) would very much like this sort of functionality. We implemented something similar by hand for one of the most common use cases: grabbing combinations of items in a list in bevyengine/bevy#1763.

I'm not sure what the solution would look like, but hopefully seeing more concrete use cases might be helpful.

@Amanieu
Copy link
Member

Amanieu commented Jun 2, 2021

This was discussed in the libs meeting today. We feel that this shouldn't be implemented as a trait: we already have inherent get methods on our collection types, so it doesn't really make sense to special case get_mut_many.

Apart than that, we are interested in a simpler version of this proposal which doesn't use a trait.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Jun 2, 2021

@Amanieu would the libs team be more in favour of a proposal for the types in the std-lib that should implement such methods?

@Amanieu
Copy link
Member

Amanieu commented Jun 2, 2021

Yes. As a general rule, I feel that all types that have a get_mut(index/key) method should probably have an equivalent get_mut_many method. So slices, VecDeque, HashMap and BTreeMap.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Jun 3, 2021

Okay thanks. Final question, should that be a direct PR then?

@scottmcm
Copy link
Member

scottmcm commented Jun 3, 2021

I think it should still be an RFC, because even if it's not a trait, it's very clearly setting a precedent.

(As in, it'll be an API that people will expect lots of types in std, and even types in crates, to start providing, so it's worth the extra oversight of an RFC. It's not just a one-off thing that'd be on one std type but nowhere else.)

@Nokel81
Copy link
Contributor Author

Nokel81 commented Jun 3, 2021

Sounds good. Shall this one be closed and I open a new one then?

@scottmcm
Copy link
Member

scottmcm commented Jun 3, 2021

@Nokel81 I think that's mostly up to you. If you think that the discussion in this thread is valuable for people reading an updated RFC, then updating this one is probably fine. If you think that the discussion here has too much about the trait details and would thus be more of a distraction from the new one, then feel free to close this and open a new one.

@SOF3
Copy link

SOF3 commented Jun 3, 2021

Without an unsafe trait SafeOrd : Ord {} or alike, it is not possible to implement this RFC for BTreeMap.

That said, it is not strictly necessary to compare equality of keys to check equality of resolved pointers, is it? Since the safe interface of this RFC performs an O(n^2) check either way, why not check these using the references returned by the unsafe implementation? In other words, the unsafe implementation only returns an array of *muts, and the safe trait is responsible for checking that these *muts do not overlap, and converting them into &mut afterwards. In this case, the implementation is robust even if we are dealing with a nonsense key type like this

struct Nonsense;
impl Ord for Nonsense {
    fn cmp(&self, other: &Self) -> Ordering {
        if rand::random::<bool>() { Ordering::Less } else { Ordering::Greater }
    }
}

In case the value is a ZST, the check could simply be skipped since there is no overlapping either way.

A bonus effect is that it improves performance by comparing pointers of usize size instead of arbitrary types (e.g. long strings), and the O(n^2) could probably be reduced to O(n log n) by numerically sorting the pointers (or maybe there are faster algorithms that I did not notice).

@Nokel81
Copy link
Contributor Author

Nokel81 commented Jun 7, 2021

Without an unsafe trait SafeOrd : Ord {} or alike, it is not possible to implement this RFC for BTreeMap.

If you look at my proposal, then you should be able to tell that such a trait is not necessary even for types that have keys which are user defined types.

@yaahc yaahc removed the I-nominated label Jun 8, 2021
@skmendez
Copy link

skmendez commented Jun 12, 2021

Would it make sense to implement this as a more exotic implementation of the SliceIndex trait, i.e. something like impl<T, const N: usize> SliceIndex<[T]> for [usize; N], with `type Output = [mut T; N]?

@scottmcm
Copy link
Member

@skmendez That can't work, because SliceIndex returns &mut Self::Output from index_mut, whereas this wants to be able to return [&mut _; N] -- there's no array anywhere in memory with the necessary things in it to be able to point at.

@skmendez
Copy link

You're right; I think this could work with GATs so that Output could be generic on the lifetime.

@joshtriplett
Copy link
Member

The would_overlap approach here seems to inherently require a quadratic implementation, checking every pair of indexes. I think most practical data structures will hopefully be able to do better than that. What would a trait design look like that avoids the inherent quadratic complexity?

@programmerjake
Copy link
Member

The would_overlap approach here seems to inherently require a quadratic implementation,

maybe use this instead, which allows using a temporary array (no allocator required if the array length is const since you can just use an array on the stack) to sort RawEntrys and guarantees that checking adjacent RawEntrys in sorted order is sufficient to check for overlap. this makes the worst case O(N log N):

unsafe trait GetMutManyRaw<Key> {
    type Value;
    type RawEntry;

    /// must be usable to sort `RawEntry`s such that for any list of `RawEntry`s
    /// there must be at least one pair of overlapping `RawEntry`s that are next
    /// to each other in the sorted list if any `RawEntry`s overlap.
    fn raw_entry_sort_cmp(&self, a: &Self::RawEntry, b: &Self::RawEntry) -> Ordering;
    fn would_overlap(&self, e1: &Self::RawEntry, e2: &Self::RawEntry) -> bool;
    unsafe fn entry_to_ref<'a>(*mut self, entry: Self::RawEntry) -> &'a mut Self::Value;
    fn get_raw_entry<'a>(&'a mut self, key: Key) -> Option<Self::RawEntry>;
}

@programmerjake
Copy link
Member

programmerjake commented Jun 19, 2024

    /// must be usable to sort `RawEntry`s such that for any list of `RawEntry`s
    /// there must be at least one pair of overlapping `RawEntry`s that are next
    /// to each other in the sorted list if any `RawEntry`s overlap.
    fn raw_entry_sort_cmp(&self, a: &Self::RawEntry, b: &Self::RawEntry) -> Ordering;

for BTreeMap, you can just compare the pointer to the node and index in the node, forHashMap and [T] you can just compare the index. for Range on slice indexes, you can compare lexicographicly, i think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.