Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alloc_slice in TypedArena #37220

Merged
merged 2 commits into from
Oct 19, 2016
Merged

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Oct 16, 2016

Added TypedArena::alloc_slice, and moved from using TypedArena<Vec<T>> to TypedArena<T>.

TypedArena::alloc_slice is implemented by copying the slices elements into the typed arena, requiring that T: Copy when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being.

This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents.

Performance:

futures-rs-test  5.048s vs  5.061s --> 0.997x faster (variance: 1.028x, 1.020x)
helloworld       0.284s vs  0.295s --> 0.963x faster (variance: 1.207x, 1.189x)
html5ever-2016-  8.396s vs  8.208s --> 1.023x faster (variance: 1.019x, 1.036x)
hyper.0.5.0      5.768s vs  5.797s --> 0.995x faster (variance: 1.027x, 1.028x)
inflate-0.1.0    5.213s vs  5.069s --> 1.028x faster (variance: 1.008x, 1.022x)
issue-32062-equ  0.428s vs  0.467s --> 0.916x faster (variance: 1.188x, 1.015x)
issue-32278-big  1.949s vs  2.010s --> 0.970x faster (variance: 1.112x, 1.049x)
jld-day15-parse  1.795s vs  1.877s --> 0.956x faster (variance: 1.037x, 1.015x)
piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x)
rust-encoding-0  2.489s vs  2.465s --> 1.010x faster (variance: 1.047x, 1.086x)
syntex-0.42.2   34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x)
syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x)

r? @eddyb

self.mk_ty(TyTuple(self.mk_type_list(ts)))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation looks messed up here

}
}

fn keep_local<'tcx, T: ty::TypeFoldable<'tcx>>(x: &T) -> bool {
x.has_type_flags(ty::TypeFlags::KEEP_IN_LOCAL_TCX)
}

fn keep_local_slice<'tcx, T: ty::TypeFoldable<'tcx>>(xs: &[T]) -> bool {
Copy link
Member

@frewsxcv frewsxcv Oct 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a slice? Could it be:

fn keep_local_iter<'tcx, T, I>(xs: I) -> bool
    where T: ty::TypeFoldable<'tcx>,
          I: IntoIterator<T>
{
    xs.into_iter().any(keep_local)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point, it's a tiny helper. Besides, what you wrote doesn't work because the iterator is of references and for keep_local the type behind the reference is foldable.

@bors
Copy link
Contributor

bors commented Oct 17, 2016

☔ The latest upstream changes (presumably #37129) made this pull request unmergeable. Please resolve the merge conflicts.

self.grow(slice.len());
}

//while (self.end.get() as usize - self.ptr.get() as usize) < slice.len() * mem::size_of::<T>() { self.grow_old() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented out debug code.

//let arena_slice = slice::from_raw_parts_mut(v.as_mut_ptr(), v.len());
//mem::forget(v);
//self.ptr.set(start_ptr.offset(arena_slice.len() as isize));
//arena_slice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented out debug code.

unsafe {
let start_ptr = self.ptr.get();
let arena_slice = slice::from_raw_parts_mut(start_ptr, slice.len());
assert!(start_ptr as *const _ != slice.as_ptr());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug assert.

assert!(start_ptr as *const _ != slice.as_ptr());
self.ptr.set(start_ptr.offset(arena_slice.len() as isize));
arena_slice.copy_from_slice(slice);
assert!(self.ptr.get() <= self.end.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug assert.

macro_rules! slice_interners {
($($field:ident: $method:ident($ty:ident)),+) => (
$(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref,
|xs: &[$ty]| -> &Slice<$ty> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| is misaligned (moving it two spaces to the right should be fine).

($($field:ident: $method:ident($ty:ident)),+) => (
$(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref,
|xs: &[$ty]| -> &Slice<$ty> {
debug!("KEK ({:?}, {}) => {:?}", xs.as_ptr(), xs.len(), xs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug logging.

$(intern_method!('tcx, $field: $method(&[$ty<'tcx>], alloc_slice, Deref::deref,
|xs: &[$ty]| -> &Slice<$ty> {
debug!("KEK ({:?}, {}) => {:?}", xs.as_ptr(), xs.len(), xs);
unsafe { mem::transmute(xs) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unindent once.

Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting on Travis to succeed. Should write a description and include perf numbers.

where T: Copy {
assert!(mem::size_of::<T>() != 0);
if slice.len() == 0 {
return unsafe { slice::from_raw_parts_mut(heap::EMPTY as *mut T, 0) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this isn't just &mut [] ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&mut [] doesn't have a guaranteed stable address across instantiations and we depend on pointer comparisons working. That said, this would only ever be invoked once so maybe it's fine to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, should we replace this with &mut []?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds far too fragile to do that.

#[inline]
pub fn alloc_slice(&self, slice: &[T]) -> &mut [T]
where T: Copy {
assert!(mem::size_of::<T>() != 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be mentioned in the doc comment I guess, since we don't have a type system way to enforce it.

@eddyb
Copy link
Member

eddyb commented Oct 17, 2016

r=me with @bluss' comments addressed

@Mark-Simulacrum
Copy link
Member Author

Added documentation about panicking on ZSTs.

Up to @eddyb if we want to merge now and I can work on further optimizations in a future PR.

@eddyb
Copy link
Member

eddyb commented Oct 17, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Oct 17, 2016

📌 Commit f9c0c30 has been approved by eddyb

@Mark-Simulacrum Mark-Simulacrum force-pushed the arena-alloc-slice branch 3 times, most recently from b88bbfa to 7a38599 Compare October 18, 2016 12:51
@eddyb
Copy link
Member

eddyb commented Oct 18, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Oct 18, 2016

📌 Commit 7a38599 has been approved by eddyb

eddyb added a commit to eddyb/rust that referenced this pull request Oct 19, 2016
…r=eddyb

alloc_slice in TypedArena

Added `TypedArena::alloc_slice`, and moved from using `TypedArena<Vec<T>>` to `TypedArena<T>`.

`TypedArena::alloc_slice` is implemented by copying the slices elements into the typed arena, requiring that `T: Copy` when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being.

This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents.

Performance:
```
futures-rs-test  5.048s vs  5.061s --> 0.997x faster (variance: 1.028x, 1.020x)
helloworld       0.284s vs  0.295s --> 0.963x faster (variance: 1.207x, 1.189x)
html5ever-2016-  8.396s vs  8.208s --> 1.023x faster (variance: 1.019x, 1.036x)
hyper.0.5.0      5.768s vs  5.797s --> 0.995x faster (variance: 1.027x, 1.028x)
inflate-0.1.0    5.213s vs  5.069s --> 1.028x faster (variance: 1.008x, 1.022x)
issue-32062-equ  0.428s vs  0.467s --> 0.916x faster (variance: 1.188x, 1.015x)
issue-32278-big  1.949s vs  2.010s --> 0.970x faster (variance: 1.112x, 1.049x)
jld-day15-parse  1.795s vs  1.877s --> 0.956x faster (variance: 1.037x, 1.015x)
piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x)
rust-encoding-0  2.489s vs  2.465s --> 1.010x faster (variance: 1.047x, 1.086x)
syntex-0.42.2   34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x)
syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x)
```

r? @eddyb
bors added a commit that referenced this pull request Oct 19, 2016
@bors
Copy link
Contributor

bors commented Oct 19, 2016

🔒 Merge conflict

@bors
Copy link
Contributor

bors commented Oct 19, 2016

☔ The latest upstream changes (presumably #37269) made this pull request unmergeable. Please resolve the merge conflicts.

@Mark-Simulacrum
Copy link
Member Author

@eddyb Rebased.

@eddyb
Copy link
Member

eddyb commented Oct 19, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Oct 19, 2016

📌 Commit 83b1982 has been approved by eddyb

@bors
Copy link
Contributor

bors commented Oct 19, 2016

⌛ Testing commit 83b1982 with merge d337f34...

bors added a commit that referenced this pull request Oct 19, 2016
alloc_slice in TypedArena

Added `TypedArena::alloc_slice`, and moved from using `TypedArena<Vec<T>>` to `TypedArena<T>`.

`TypedArena::alloc_slice` is implemented by copying the slices elements into the typed arena, requiring that `T: Copy` when using it. We allocate a new chunk when there's insufficient space remaining in the previous chunk, and we cannot resize the old chunk in place. This is non-optimal, since we may waste allocated space when allocating (especially longer) slices, but is considered good enough for the time being.

This change also reduces heap fragmentation, since the arena now directly stores objects instead of storing the Vec's length and pointer to its contents.

Performance:
```
futures-rs-test  5.048s vs  5.061s --> 0.997x faster (variance: 1.028x, 1.020x)
helloworld       0.284s vs  0.295s --> 0.963x faster (variance: 1.207x, 1.189x)
html5ever-2016-  8.396s vs  8.208s --> 1.023x faster (variance: 1.019x, 1.036x)
hyper.0.5.0      5.768s vs  5.797s --> 0.995x faster (variance: 1.027x, 1.028x)
inflate-0.1.0    5.213s vs  5.069s --> 1.028x faster (variance: 1.008x, 1.022x)
issue-32062-equ  0.428s vs  0.467s --> 0.916x faster (variance: 1.188x, 1.015x)
issue-32278-big  1.949s vs  2.010s --> 0.970x faster (variance: 1.112x, 1.049x)
jld-day15-parse  1.795s vs  1.877s --> 0.956x faster (variance: 1.037x, 1.015x)
piston-image-0. 13.554s vs 13.522s --> 1.002x faster (variance: 1.019x, 1.020x)
rust-encoding-0  2.489s vs  2.465s --> 1.010x faster (variance: 1.047x, 1.086x)
syntex-0.42.2   34.646s vs 34.593s --> 1.002x faster (variance: 1.007x, 1.005x)
syntex-0.42.2-i 17.181s vs 17.163s --> 1.001x faster (variance: 1.004x, 1.004x)
```

r? @eddyb
@bors bors merged commit 83b1982 into rust-lang:master Oct 19, 2016
@Mark-Simulacrum Mark-Simulacrum deleted the arena-alloc-slice branch October 22, 2016 02:12
bors added a commit that referenced this pull request Oct 26, 2016
…ddyb

Add ArrayVec and AccumulateVec to reduce heap allocations during interning of slices

Updates `mk_tup`, `mk_type_list`, and `mk_substs` to allow interning directly from iterators. The previous PR, #37220, changed some of the calls to pass a borrowed slice from `Vec` instead of directly passing the iterator, and these changes further optimize that to avoid the allocation entirely.

This change yields 50% less malloc calls in [some cases](https://pastebin.mozilla.org/8921686). It also yields decent, though not amazing, performance improvements:
```
futures-rs-test  4.091s vs  4.021s --> 1.017x faster (variance: 1.004x, 1.004x)
helloworld       0.219s vs  0.220s --> 0.993x faster (variance: 1.010x, 1.018x)
html5ever-2016-  3.805s vs  3.736s --> 1.018x faster (variance: 1.003x, 1.009x)
hyper.0.5.0      4.609s vs  4.571s --> 1.008x faster (variance: 1.015x, 1.017x)
inflate-0.1.0    3.864s vs  3.883s --> 0.995x faster (variance: 1.232x, 1.005x)
issue-32062-equ  0.309s vs  0.299s --> 1.033x faster (variance: 1.014x, 1.003x)
issue-32278-big  1.614s vs  1.594s --> 1.013x faster (variance: 1.007x, 1.004x)
jld-day15-parse  1.390s vs  1.326s --> 1.049x faster (variance: 1.006x, 1.009x)
piston-image-0. 10.930s vs 10.675s --> 1.024x faster (variance: 1.006x, 1.010x)
reddit-stress    2.302s vs  2.261s --> 1.019x faster (variance: 1.010x, 1.026x)
regex.0.1.30     2.250s vs  2.240s --> 1.005x faster (variance: 1.087x, 1.011x)
rust-encoding-0  1.895s vs  1.887s --> 1.005x faster (variance: 1.005x, 1.018x)
syntex-0.42.2   29.045s vs 28.663s --> 1.013x faster (variance: 1.004x, 1.006x)
syntex-0.42.2-i 13.925s vs 13.868s --> 1.004x faster (variance: 1.022x, 1.007x)
```

We implement a small-size optimized vector, intended to be used primarily for collection of presumed to be short iterators. This vector cannot be "upsized/reallocated" into a heap-allocated vector, since that would require (slow) branching logic, but during the initial collection from an iterator heap-allocation is possible.

We make the new `AccumulateVec` and `ArrayVec` generic over implementors of the `Array` trait, of which there is currently one, `[T; 8]`. In the future, this is likely to expand to other values of N.

Huge thanks to @nnethercote for collecting the performance and other statistics mentioned above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants