Extending some number of Vecs #721

dragostis · 2020-02-04T11:26:19Z

In order to gain a performance boost from auto-vectorization, I'm storing data in multiple Vecs. Think struct of arrays instead of array of structs. In the array of structs case, it's very easy to simply extend the vector with whatever the parallel iterator provides. However, in the other case I don't have this luxury.

I came up with two solutions:

vec1.reserve(...);
...
vecN.reserve(...);

/// initialize Vec with Default data; but this is visibly expensive in my benchmarks

vec1.par_iter_mut().zip(...).map(|(val, ...)| *val = ...)

Or use uninitialized memory:

let vec1: Vec<MaybeUninit<...>> = ...;
...
let vecN: Vec<MaybeUninit<...>> = ...;

vec1.reserve(...);
...
vecN.reserve(...);

vec1.par_iter_mut().zip(...).map(|(val, ...)| unsafe { ... })

The problem I have with the second solution is that it's very painful to use and it litters all my code with a lot of unsafe. However, I don't see any solution to remove it since initializing the vector with default values is visibly costly. Not zipping the iterators together also doesn't work because it will have to iterate through the data multiple times.

Any ideas on how to improve this?

The text was updated successfully, but these errors were encountered:

dineshadepu · 2020-02-04T11:32:49Z

Recent version of Rayon provides us with parallel iteration over a tuple.

(a, b, c).into_par_iter()
            .for_each(|(a_i, b_i, c_i)| {
     ;; write your code here
}

I think this make code look better, even though this doesn't help answering the current question.

cuviper · 2020-02-05T22:48:48Z

Rayon also has two implementations of ParallelExtend that work like unzip and partition_map:

impl<A, B, FromA, FromB> ParallelExtend<(A, B)> for (FromA, FromB)
where
    A: Send,
    B: Send,
    FromA: Send + ParallelExtend<A>,
    FromB: Send + ParallelExtend<B>,
{...}

impl<L, R, A, B> ParallelExtend<Either<L, R>> for (A, B)
where
    L: Send,
    R: Send,
    A: Send + ParallelExtend<L>,
    B: Send + ParallelExtend<R>,
{...}

These were added in #604, and as noted there, you can also nest this in pairs.

let (vec1, (vec2, (vec3, vec4))) = par_iter.map(|(a, b, c, d)| (a, (b, (c, d)))).unzip();

That should also work for calling par_extend directly. Perhaps we could add flattened expansions of this similar to the MultiZip that @dineshadepu mentioned. I'd probably only want that for the unzip-like fashion, as trying to mix Either in there for partition_map will get really complex.

I kind of wish we had impl<C: ParallelExtend> ParallelExtend for &mut C too, as it would make this automatically able to extend a tuple of mutable references, instead of just by value. That would be a breaking change to add such a blanket impl now. Even testing just for (&mut A, &mut B) now complains:

error[E0119]: conflicting implementations of trait `iter::ParallelExtend<(_, _)>` for type `(&mut _, &mut _)`:
   --> src/iter/unzip.rs:428:1
    |
413 | / impl<A, B, FromA, FromB> ParallelExtend<(A, B)> for (FromA, FromB)
414 | | where
415 | |     A: Send,
416 | |     B: Send,
...   |
425 | |     }
426 | | }
    | |_- first implementation here
427 |
428 | / impl<A, B, FromA, FromB> ParallelExtend<(A, B)> for (&'_ mut FromA, &'_ mut FromB)
429 | | where
430 | |     A: Send,
431 | |     B: Send,
...   |
440 | |     }
441 | | }
    | |_^ conflicting implementation for `(&mut _, &mut _)`
    |
    = note: downstream crates may implement trait `iter::ParallelExtend<_>` for type `&mut _`
    = note: downstream crates may implement trait `iter::ParallelExtend<_>` for type `&mut _`

dragostis · 2020-02-07T10:45:00Z

That should also work for calling par_extend directly.

This doesn't seem to work because of the lack of impl<C: ParallelExtend> ParallelExtend for &mut C.

error[E0599]: no method named `par_extend` found for type `(&mut std::vec::Vec<_>, &mut std::vec::Vec<_>)` in the current scope
  --> src/main.rs:13:28
   |
13 |     (&mut vec0, &mut vec1).par_extend((1u32..10).into_par_iter().map(|n| (n, n)));
   |                            ^^^^^^^^^^ method not found in `(&mut std::vec::Vec<_>, &mut std::vec::Vec<_>)`

error: aborting due to previous error

I'm using this as a temporary fix:

struct ExtendVec<'a, T> {
    vec: &'a mut Vec<T>,
}

impl<'a, T> ExtendVec<'a, T> {
    pub fn new(vec: &'a mut Vec<T>) -> Self {
        Self { vec }
    }
}

impl<T: Send> ParallelExtend<T> for ExtendVec<'_, T> {
    fn par_extend<I>(&mut self, par_iter: I)
        where I: IntoParallelIterator<Item = T>
    {
        self.vec.par_extend(par_iter);
    }
}

This seems to cause rustc to compile very slowly.

cuviper · 2020-02-07T17:32:42Z

I'm using this as a temporary fix:

Ah, a newtype wrapper does the trick -- we could even make something generic like that.

This seems to cause rustc to compile very slowly.

I'm not surprised. While the generic unzip.rs is not a lot of code, it's a fairly complex composition, and it will compound quickly when those are nested together. It might fare a little better if you nest that like a binary tree, (((vec0, vec1), (vec2, vec3)), ((vec4, vec5), (vec6, vec7))), but it will still be a challenge.

In a presentation I gave last year, I demonstrated a giant symbol -- 132967 characters! -- from just a 3-way collect. See also #671, which was alleviated some by #673 and rust-lang/rust#62429, but the overall structure still remains the same.

If we had something more directly designed for just unzipping vectors, it wouldn't need to be nearly so complicated.

dragostis · 2020-02-08T09:50:06Z

I guess it's probably because of the giant symbols and the monomorphization that this is so slow. The tree arrangement doesn't help much, unfortunately.

I was thinking whether there would be a way to maybe implement par_extend for each type of tuple that I'm using separately, or maybe have an 8-way unzip implementation.

Even if the solution is dirty, it would still be much better than my current unsafe-riddled approach. Do you think any of these ideas could help reduce the costs?

dragostis · 2020-02-12T16:47:07Z

I was taking a look at the unzip implementation and it seems like even a manual 8-way tuple would probably produce the same result.

@cuviper, it seems like finding a way to force the Cosumer to be the same in all generic types would be a way to solve this, but this would mean that any UnzipN type would not be able to implement the ParallelIterator trait. Do you think there's any way around this? ~~Maybe the consumers can be made into trait objects and saved in a Vec, but I'm not completely sure they are convertable to trait objects and whether or not this would impact performance in any way.~~

cuviper · 2020-02-12T22:55:05Z

I'm at a loss for designing a better generic unzip. It's really difficult that we can only get each ParallelExtend consumer through a callback into drive_unindexed.

Maybe we need a new trait with a method that directly creates a consumer -- although that also means each collection's consumer type would become part of the public API.

dragostis · 2020-02-13T16:22:06Z

So, my current approach is basically extending CollectConsumer to CollectTupleConsumer. I've got a working MVP, but I don't know if this is the best approach. However, it does compile fast enough.

dragostis · 2021-01-28T11:05:10Z

@cuviper, would it desirable to have the ParallelExtend trait be defined for larger tuples, e.g. up to 16? While my workaround works well, I'd rather not have to use unsafe for this.

cuviper mentioned this issue Sep 17, 2020

partition_map with a fallible closure - how to shortcircuit? #801

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending some number of Vecs #721

Extending some number of Vecs #721

dragostis commented Feb 4, 2020

dineshadepu commented Feb 4, 2020

cuviper commented Feb 5, 2020

dragostis commented Feb 7, 2020 •

edited

Loading

cuviper commented Feb 7, 2020

dragostis commented Feb 8, 2020

dragostis commented Feb 12, 2020 •

edited

Loading

cuviper commented Feb 12, 2020

dragostis commented Feb 13, 2020

dragostis commented Jan 28, 2021

Extending some number of Vecs #721

Extending some number of Vecs #721

Comments

dragostis commented Feb 4, 2020

dineshadepu commented Feb 4, 2020

cuviper commented Feb 5, 2020

dragostis commented Feb 7, 2020 • edited Loading

cuviper commented Feb 7, 2020

dragostis commented Feb 8, 2020

dragostis commented Feb 12, 2020 • edited Loading

cuviper commented Feb 12, 2020

dragostis commented Feb 13, 2020

dragostis commented Jan 28, 2021

dragostis commented Feb 7, 2020 •

edited

Loading

dragostis commented Feb 12, 2020 •

edited

Loading