Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize PartialOrd<A> for [A] where A: Ord #39642

Merged
merged 3 commits into from Feb 11, 2017
Merged

Specialize PartialOrd<A> for [A] where A: Ord #39642

merged 3 commits into from Feb 11, 2017

Conversation

ghost
Copy link

@ghost ghost commented Feb 8, 2017

This way we can call cmp instead of partial_cmp in the loop, removing some burden of optimizing Options away from the compiler.

PR #39538 introduced a regression where sorting slices suddenly became slower, since slice1.lt(slice2) was much slower than slice1.cmp(slice2) == Less. This problem is now fixed.

To verify, I benchmarked this simple program:

fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the is_less lambda to use cmp and partial_cmp. Now all three versions (lt, cmp, partial_cmp) are equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.

Tangentially, as soon as we get default impl, it might be a good idea to implement a blanket default impl for lt, gt, le, ge in terms of cmp whenever possible. Today, those four functions by default are only implemented in terms of partial_cmp.

r? @alexcrichton

This way we can call `cmp` instead of `partial_cmp` in the loop,
removing some burden of optimizing `Option`s away from the compiler.

PR #39538 introduced a regression where sorting slices suddenly became
slower, since `slice1.lt(slice2)` was much slower than
`slice1.cmp(slice2) == Less`. This problem is now fixed.

To verify, I benchmarked this simple program:
```rust
fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}
```

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the `is_less` lambda to use `cmp` and
`partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are
equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.
self.len().partial_cmp(&other.len())
}
}

impl SlicePartialOrd<u8> for [u8] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this impl not be extended to all A: Ord so it would simply use Some(SliceOrd::compare(self, other))?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion, thanks! I've updated the code.

Some(SliceOrd::compare(self, other))
}
}

impl SlicePartialOrd<u8> for [u8] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this specialization could be removed now, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removed.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 8, 2017
@alexcrichton
Copy link
Member

Looks good to me, thanks @stjepang!

cc @rust-lang/libs, @bluss, I'm sure this has the likelihood of being super subtle, so would be good to get some more eyes on this as well

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Feb 10, 2017

📌 Commit a344c12 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Feb 11, 2017

⌛ Testing commit a344c12 with merge f140a6c...

bors added a commit that referenced this pull request Feb 11, 2017
…ichton

Specialize `PartialOrd<A> for [A] where A: Ord`

This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler.

PR #39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed.

To verify, I benchmarked this simple program:
```rust
fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}
```

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.

Tangentially, as soon as we get `default impl`, it might be a good idea to implement a blanket default impl for `lt`, `gt`, `le`, `ge` in terms of `cmp` whenever possible. Today, those four functions by default are only implemented in terms of `partial_cmp`.

r? @alexcrichton
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 11, 2017
…d, r=alexcrichton

Specialize `PartialOrd<A> for [A] where A: Ord`

This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler.

PR rust-lang#39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed.

To verify, I benchmarked this simple program:
```rust
fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}
```

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.

Tangentially, as soon as we get `default impl`, it might be a good idea to implement a blanket default impl for `lt`, `gt`, `le`, `ge` in terms of `cmp` whenever possible. Today, those four functions by default are only implemented in terms of `partial_cmp`.

r? @alexcrichton
@bors
Copy link
Contributor

bors commented Feb 11, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing f140a6c to master...

@bors bors merged commit a344c12 into rust-lang:master Feb 11, 2017
@ghost ghost deleted the specialize-slice-partialord branch February 11, 2017 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants