Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending some number of Vecs #721

Open
dragostis opened this issue Feb 4, 2020 · 9 comments
Open

Extending some number of Vecs #721

dragostis opened this issue Feb 4, 2020 · 9 comments

Comments

@dragostis
Copy link

In order to gain a performance boost from auto-vectorization, I'm storing data in multiple Vecs. Think struct of arrays instead of array of structs. In the array of structs case, it's very easy to simply extend the vector with whatever the parallel iterator provides. However, in the other case I don't have this luxury.

I came up with two solutions:

vec1.reserve(...);
...
vecN.reserve(...);

/// initialize Vec with Default data; but this is visibly expensive in my benchmarks

vec1.par_iter_mut().zip(...).map(|(val, ...)| *val = ...)

Or use uninitialized memory:

let vec1: Vec<MaybeUninit<...>> = ...;
...
let vecN: Vec<MaybeUninit<...>> = ...;

vec1.reserve(...);
...
vecN.reserve(...);

vec1.par_iter_mut().zip(...).map(|(val, ...)| unsafe { ... })

The problem I have with the second solution is that it's very painful to use and it litters all my code with a lot of unsafe. However, I don't see any solution to remove it since initializing the vector with default values is visibly costly. Not zipping the iterators together also doesn't work because it will have to iterate through the data multiple times.

Any ideas on how to improve this?

@dineshadepu
Copy link

Recent version of Rayon provides us with parallel iteration over a tuple.

(a, b, c).into_par_iter()
            .for_each(|(a_i, b_i, c_i)| {
     ;; write your code here
}

I think this make code look better, even though this doesn't help answering the current question.

@cuviper
Copy link
Member

cuviper commented Feb 5, 2020

Rayon also has two implementations of ParallelExtend that work like unzip and partition_map:

impl<A, B, FromA, FromB> ParallelExtend<(A, B)> for (FromA, FromB)
where
    A: Send,
    B: Send,
    FromA: Send + ParallelExtend<A>,
    FromB: Send + ParallelExtend<B>,
{...}

impl<L, R, A, B> ParallelExtend<Either<L, R>> for (A, B)
where
    L: Send,
    R: Send,
    A: Send + ParallelExtend<L>,
    B: Send + ParallelExtend<R>,
{...}

These were added in #604, and as noted there, you can also nest this in pairs.

let (vec1, (vec2, (vec3, vec4))) = par_iter.map(|(a, b, c, d)| (a, (b, (c, d)))).unzip();

That should also work for calling par_extend directly. Perhaps we could add flattened expansions of this similar to the MultiZip that @dineshadepu mentioned. I'd probably only want that for the unzip-like fashion, as trying to mix Either in there for partition_map will get really complex.

I kind of wish we had impl<C: ParallelExtend> ParallelExtend for &mut C too, as it would make this automatically able to extend a tuple of mutable references, instead of just by value. That would be a breaking change to add such a blanket impl now. Even testing just for (&mut A, &mut B) now complains:

error[E0119]: conflicting implementations of trait `iter::ParallelExtend<(_, _)>` for type `(&mut _, &mut _)`:
   --> src/iter/unzip.rs:428:1
    |
413 | / impl<A, B, FromA, FromB> ParallelExtend<(A, B)> for (FromA, FromB)
414 | | where
415 | |     A: Send,
416 | |     B: Send,
...   |
425 | |     }
426 | | }
    | |_- first implementation here
427 |
428 | / impl<A, B, FromA, FromB> ParallelExtend<(A, B)> for (&'_ mut FromA, &'_ mut FromB)
429 | | where
430 | |     A: Send,
431 | |     B: Send,
...   |
440 | |     }
441 | | }
    | |_^ conflicting implementation for `(&mut _, &mut _)`
    |
    = note: downstream crates may implement trait `iter::ParallelExtend<_>` for type `&mut _`
    = note: downstream crates may implement trait `iter::ParallelExtend<_>` for type `&mut _`

@dragostis
Copy link
Author

dragostis commented Feb 7, 2020

That should also work for calling par_extend directly.

This doesn't seem to work because of the lack of impl<C: ParallelExtend> ParallelExtend for &mut C.

error[E0599]: no method named `par_extend` found for type `(&mut std::vec::Vec<_>, &mut std::vec::Vec<_>)` in the current scope
  --> src/main.rs:13:28
   |
13 |     (&mut vec0, &mut vec1).par_extend((1u32..10).into_par_iter().map(|n| (n, n)));
   |                            ^^^^^^^^^^ method not found in `(&mut std::vec::Vec<_>, &mut std::vec::Vec<_>)`

error: aborting due to previous error

I'm using this as a temporary fix:

struct ExtendVec<'a, T> {
    vec: &'a mut Vec<T>,
}

impl<'a, T> ExtendVec<'a, T> {
    pub fn new(vec: &'a mut Vec<T>) -> Self {
        Self { vec }
    }
}

impl<T: Send> ParallelExtend<T> for ExtendVec<'_, T> {
    fn par_extend<I>(&mut self, par_iter: I)
        where I: IntoParallelIterator<Item = T>
    {
        self.vec.par_extend(par_iter);
    }
}

This seems to cause rustc to compile very slowly.

@cuviper
Copy link
Member

cuviper commented Feb 7, 2020

I'm using this as a temporary fix:

Ah, a newtype wrapper does the trick -- we could even make something generic like that.

This seems to cause rustc to compile very slowly.

I'm not surprised. While the generic unzip.rs is not a lot of code, it's a fairly complex composition, and it will compound quickly when those are nested together. It might fare a little better if you nest that like a binary tree, (((vec0, vec1), (vec2, vec3)), ((vec4, vec5), (vec6, vec7))), but it will still be a challenge.

In a presentation I gave last year, I demonstrated a giant symbol -- 132967 characters! -- from just a 3-way collect. See also #671, which was alleviated some by #673 and rust-lang/rust#62429, but the overall structure still remains the same.

If we had something more directly designed for just unzipping vectors, it wouldn't need to be nearly so complicated.

@dragostis
Copy link
Author

I guess it's probably because of the giant symbols and the monomorphization that this is so slow. The tree arrangement doesn't help much, unfortunately.

I was thinking whether there would be a way to maybe implement par_extend for each type of tuple that I'm using separately, or maybe have an 8-way unzip implementation.

Even if the solution is dirty, it would still be much better than my current unsafe-riddled approach. Do you think any of these ideas could help reduce the costs?

@dragostis
Copy link
Author

dragostis commented Feb 12, 2020

I was taking a look at the unzip implementation and it seems like even a manual 8-way tuple would probably produce the same result.

@cuviper, it seems like finding a way to force the Cosumer to be the same in all generic types would be a way to solve this, but this would mean that any UnzipN type would not be able to implement the ParallelIterator trait. Do you think there's any way around this? Maybe the consumers can be made into trait objects and saved in a Vec, but I'm not completely sure they are convertable to trait objects and whether or not this would impact performance in any way.

@cuviper
Copy link
Member

cuviper commented Feb 12, 2020

I'm at a loss for designing a better generic unzip. It's really difficult that we can only get each ParallelExtend consumer through a callback into drive_unindexed.

Maybe we need a new trait with a method that directly creates a consumer -- although that also means each collection's consumer type would become part of the public API.

@dragostis
Copy link
Author

So, my current approach is basically extending CollectConsumer to CollectTupleConsumer. I've got a working MVP, but I don't know if this is the best approach. However, it does compile fast enough.

@dragostis
Copy link
Author

@cuviper, would it desirable to have the ParallelExtend trait be defined for larger tuples, e.g. up to 16? While my workaround works well, I'd rather not have to use unsafe for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants