Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to a more efficient vector/string representation #8981

Closed
thestinger opened this issue Sep 4, 2013 · 11 comments
Closed

switch to a more efficient vector/string representation #8981

thestinger opened this issue Sep 4, 2013 · 11 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Milestone

Comments

@thestinger
Copy link
Contributor

They are currently represented them as *{ length, capacity, data[] }. It means we have to handle trickier uint overflow cases, which increases cost size a fair bit. It also means even an empty vector always requires a heap allocation.

C++ represents a vector as { length, capacity, *data }, so a fresh empty vector simply as a null data pointer. Unlike the OptVec optimization, it does not require an extra branch in every method. Calling realloc on a null pointer is well-defined, so you can ignore the special case.

This has the drawback of making a move involve a 3-word copy, not 1-word. However, passing around unique vectors by-move rather than slices is not common enough to be a very relevant performance issue.

Benchmarks on x86_64 Linux (linux 3.11.5, glibc 2.18):

Comparison ~[T] Proposal
Empty 29 ns/iter (+/- 1) 2 ns/iter (+/- 0)
Push 10k, starting at 16 capacity 45528 ns/iter (+/- 2584) 7611 ns/iter (+/- 407)
Push 10k, starting at 10k capacity 43783 ns/iter (+/- 1429) 6876 ns/iter (+/- 320)
Size size_of::<uint> 3 * size_of::<uint>
Empty heap allocation size minimum 2 * size_of::<uint> n/a
4 element heap allocation size 2 * size_of::<uint> + 4 * size_of::<T> 4 * size_of::<T>

Note that since jemalloc is very clever, at some sizes preallocating has next to no advantage for the new proposed vector too.

Implementation:

#[allow(ctypes)];

#[cfg(test)]
extern mod extra;

use std::mem::size_of;
use std::cast::transmute;
use std::unstable::raw::Slice;
use std::unstable::intrinsics::move_val_init;
use std::ptr::read_ptr;

// redefine these to use `*mut` ...

extern {
    fn malloc(size: uint) -> *mut u8;
    fn realloc(ptr: *mut u8, size: uint) -> *mut u8;
    fn free(ptr: *mut u8);
    fn abort() -> !;
}

#[fixed_stack_segment]
unsafe fn malloc_raw(size: uint) -> *mut u8 {
    let ptr = malloc(size);
    if ptr.is_null() {
        abort()
    }
    ptr
}

#[fixed_stack_segment]
unsafe fn realloc_raw(ptr: *mut u8, size: uint) -> *mut u8 {
    let ptr = realloc(ptr, size);
    if ptr.is_null() {
        abort()
    }
    ptr
}

#[fixed_stack_segment]
unsafe fn free_raw(ptr: *mut u8) {
    free(ptr)
}

struct Vec<T> {
    priv len: uint,
    priv cap: uint,
    priv ptr: *mut T
}

impl<T> Vec<T> {
    #[inline]
    fn new() -> Vec<T> {
        Vec { len: 0, cap: 0, ptr: 0 as *mut T }
    }

    #[inline]
    fn with_capacity(capacity: uint) -> Vec<T> {
        let size = capacity.checked_mul(&size_of::<T>()).unwrap();
        unsafe {
            Vec { len: 0, cap: capacity, ptr: malloc_raw(size) as *mut T }
        }
    }

    #[inline]
    fn shrink_to_fit(&mut self) {
        unsafe {
            self.cap = self.len;
            if self.len == 0 {
                free_raw(self.ptr as *mut u8);
                self.ptr = 0 as *mut T;
            } else {
                self.ptr = realloc_raw(self.ptr as *mut u8, self.cap * size_of::<T>()) as *mut T;
            }
        }
    }

    #[inline]
    fn push(&mut self, value: T) {
        unsafe {
            if self.len == self.cap {
                if self.cap == 0 { self.cap += 2 }
                let old_size = self.cap * size_of::<T>();
                self.cap = self.cap * 2;
                let size = old_size.checked_mul(&2).unwrap();
                self.ptr = realloc_raw(self.ptr as *mut u8, size) as *mut T;
            }

            let end = self.ptr.offset(self.len as int);
            move_val_init(&mut *end, value);
            self.len += 1;
        }
    }

    #[inline]
    fn as_slice<'r>(&'r self) -> &'r [T] {
        let slice = Slice { data: self.ptr as *T, len: self.len };
        unsafe { transmute(slice) }
    }

    #[inline]
    fn as_mut_slice<'r>(&'r mut self) -> &'r mut [T] {
        let slice = Slice { data: self.ptr as *T, len: self.len };
        unsafe { transmute(slice) }
    }
}

#[unsafe_destructor]
impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        unsafe {
            for x in self.as_mut_slice().mut_iter() {
                read_ptr(x);
            }
            free_raw(self.ptr as *mut u8)
        }
    }
}

#[cfg(test)]
mod bench {
    use std;
    use super::Vec;
    use extra::test::BenchHarness;

    #[bench]
    fn empty(bh: &mut BenchHarness) {
        do bh.iter() {
            let _xs: Vec<int> = Vec::new();
        }
    }

    #[bench]
    fn empty_std(bh: &mut BenchHarness) {
        do bh.iter() {
            let _xs: ~[int] = ~[];
        }
    }

    static size: uint = 10000;

    #[bench]
    fn push(bh: &mut BenchHarness) {
        do bh.iter() {
            let mut xs = Vec::with_capacity(16);
            for i in range(0, size) {
                xs.push(i);
            }
        }
    }

    #[bench]
    fn push_std(bh: &mut BenchHarness) {
        do bh.iter() {
            let mut xs = std::vec::with_capacity(16);
            for i in range(0, size) {
                xs.push(i);
            }
        }
    }

    #[bench]
    fn push_preallocated(bh: &mut BenchHarness) {
        do bh.iter() {
            let mut xs = Vec::with_capacity(size);
            for i in range(0, size) {
                xs.push(i);
            }
        }
    }

    #[bench]
    fn push_preallocated_std(bh: &mut BenchHarness) {
        do bh.iter() {
            let mut xs = std::vec::with_capacity(size);
            for i in range(0, size) {
                xs.push(i);
            }
        }
    }
}

#[test]
fn push() {
    let mut xs = Vec::new();
    for x in std::iter::range_step(0, 100, 5) {
        xs.push(x)
    }
    let expected = std::iter::range_step(0, 100, 5).to_owned_vec();
    assert_eq!(xs.as_slice(), expected.as_slice());
}
@pnkfelix
Copy link
Member

pnkfelix commented Sep 5, 2013

It also means even an empty vector requires a heap allocation.

I do not see the fact that we are heap-allocating some storage when making empty vectors as a drawback, at least not a serious one (assuming that empty vectors always have non-zero capacity). When I see an expression constructing an empty vector, I expect some elements to be pushed into it in the (near?) future, so we are just eagerly doing some work that we would do eventually anyway.

(One exception to this that I could imagine would be a sparse collection backed by a vector of vectors, where most of its entries are empty.)

@thestinger
Copy link
Contributor Author

The issue is that you don't have the option of creating an empty vector at all, without an extra branch on every operation (indexing, checking size, etc.) like you end up doing with OptVec. It makes sense to pay that price for the benefits of a small vector, but not for empty ones.

Before:

let mut xs = ~[]; // allocates space for 4 elements
xs.push(4);

After:

let mut xs = ~[]; // data pointer is null
xs.push(4); // allocates space for 4 elements

The allocation would just happen when you first push instead of at the declaration, so there wouldn't be no performance hit compared to the current situation. You always have the option of using with_capacity or reserve too.

@pnkfelix
Copy link
Member

pnkfelix commented Sep 5, 2013

@thestinger When you say "empty vector", you here mean "zero-capacity vector", right?

(update: okay, a later revision to thestinger's comment makes it clear that is what he means.)

@pnkfelix
Copy link
Member

pnkfelix commented Sep 5, 2013

@thestinger I understand your example, and it matches my understanding from when I wrote my original comment. When I wrote "empty vector", I mean zero-length vectors. I was trying to say "I do not see forcing non-zero-capacity on zero-length vectors as a drawback, because I expect zero-length vectors to be non-zero-length in the near future." (And then I noted one exception where I could see a case motivating support for zero-capacity vectors.)

I guess all I was saying was that I wanted us to apply appropriate weights when doing a cost/benefit analysis of the change you are proposing.

@nikomatsakis
Copy link
Contributor

This is somewhat orthogonal to DST since we have already decided that there would be special representations for &[] and &str as compared to &T for some other T. This is ok because of monomorphization, though it does mean that one cannot cast a DST like [] to an object (that is &[] cannot be cast to an object &Foo even if [] implements Foo). The problem there is that objects must be manipulable without knowing the precise receiver type, so they must always be a single pointer.

However, thinking on this a bit harder, if we could preserve the invariant that ~T is always 1 word except for objects (that is, ~Foo is two words), we might be able to lift the object restriction, though it'd require some cleverness. The idea is that if you have a type like &[] that you cast to &Foo, we could actually create a temporary and store a pointer to the temporary. This is ok if the lifetime of the &Foo is short enough. Actually that's pretty yucky, because it would mean that you couldn't cast an &'a [] to a &'a Foo for all 'a, since some of those lifetimes might exceed the current stack frame. Moreover it'd require some tricky footwork in trans to deal with the extra deref. So scratch that idea. Rather I'm inclined to say that casting pointer-T to an object requires that T be sized.

Given that restriction, I believe we are free to represent pointer-DST in whatever ways we prefer.

@lilyball
Copy link
Contributor

@pnkfelix I have written many functions that only sometimes need a vector, and sometimes don't. It's annoying to take the perf hit of heap allocation for the cases where I end up not needing the vector. A common case here is if I need to normalize data, but I want to avoid doing work in the case where the data is already normalized.

@thestinger
Copy link
Contributor Author

There's no harm in doing the allocation at the first push instead of at initialization. I've verified that the only drop in efficiency is a call to realloc instead of malloc, with jemalloc branching immediately on null. LLVM could be taught to replace realloc calls on null pointers to malloc calls but the difference isn't measurable.

@thestinger
Copy link
Contributor Author

I updated the numbers based on the removal of jemalloc. They are likely much more platform-specific now.

@alexcrichton
Copy link
Member

The consensus from today's meeting is that we should change the representation of owned vectors. We may possibly want to change the representation of slices as well, but that'll be an issue to tackle after this one.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 7, 2013

accepted for P-backcompat-lang for 1.0

@thestinger
Copy link
Contributor Author

Superseded by #11875.

flip1995 pushed a commit to flip1995/rust that referenced this issue Jun 30, 2022
…nt_drop_lint, r=flip1995

Add details about how significant drop in match scrutinees can cause deadlocks

Adds more details about how a significant drop in a match scrutinee can cause a deadlock and include link to documentation.

changelog: Add more details to significant drop lint to explicitly show how temporaries in match scrutinees can cause deadlocks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants