Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More ObligationForest improvements #64545

Merged
merged 5 commits into from
Sep 19, 2019
Merged

Conversation

nnethercote
Copy link
Contributor

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis

`Node` has an optional parent and a list of other descendents. Most of
the time the parent is treated the same as the other descendents --
error-handling is the exception -- and chaining them together for
iteration has a non-trivial cost.

This commit changes the representation. There is now a single list of
descendants, and a boolean flag that indicates if there is a parent (in
which case it is first descendent). This representation encodes the same
information, in a way that is less idiomatic but cheaper to iterate over
for the common case where the parent doesn't need special treatment.

As a result, some benchmark workloads are up to 2% faster.
The size of the indices doesn't matter much here, and having a
`newtype_index!` index type without also using `IndexVec` requires lots
of conversions. So this commit removes `NodeIndex` in favour of uniform
use of `usize` as the index type. As well as making the code slightly
more concise, it also slightly speeds things up.
Now that all indices have type `usize`, it makes sense to be more
consistent about their naming. This commit removes all uses of `i` in
favour of `index`.
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 17, 2019
@nnethercote
Copy link
Contributor Author

Some local measurements:

inflate-check
        avg: -10.3%     min: -14.1%     max: -0.0%
keccak-check
        avg: -7.6%      min: -11.7%     max: -0.0%
cranelift-codegen-check
        avg: -1.3%      min: -2.2%      max: -0.0%
clap-rs-check
        avg: -0.7%      min: -1.7%      max: 0.0%
serde-check
        avg: -0.7%      min: -1.2%      max: -0.0%
deeply-nested-check
        avg: -0.5%      min: -0.8%      max: 0.0%
unify-linearly-check
        avg: -0.3%      min: -0.6%      max: 0.0%
style-servo-check
        avg: -0.4%      min: -0.6%      max: 0.0%
wg-grammar-check
        avg: -0.3%      min: -0.5%      max: -0.0%

Let's see how a CI perf run compares.

@bors try

@bors
Copy link
Contributor

bors commented Sep 17, 2019

⌛ Trying commit b261edf with merge 2ae267a...

bors added a commit that referenced this pull request Sep 17, 2019
More `ObligationForest` improvements

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis
@bors
Copy link
Contributor

bors commented Sep 17, 2019

☀️ Try build successful - checks-azure
Build commit: 2ae267a

@Mark-Simulacrum
Copy link
Member

@rust-timer build 2ae267a

@rust-timer
Copy link
Collaborator

Queued 2ae267a with parent 5670d04, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 2ae267a, comparison URL.

@nnethercote
Copy link
Contributor Author

Perf results on CI are similar to what I saw locally: big wins on keccak and inflate, smaller wins elsewhere.

I think the lesson from this PR (especially commits 1, 4 and 5) is "don't rely on the compiler to fully optimize higher-level constructs in the hottest code".

nnethercote added a commit to nnethercote/rust that referenced this pull request Sep 18, 2019
PR rust-lang#64545 got a big speed-up by replacing a hot call to `all()` with
explicit iteration. This is because the implementation of `all()` is
excessively complex: it wraps the given predicate in a closure that
returns a `LoopState`, passes that closure to `try_for_each()`, which
wraps the first closure in a second closure, passes that second closure
to `try_fold()`, which does the actual iteration using the second
closure.

A sufficient smart compiler could optimize all this away; rustc is
currently not sufficiently smart.

This commit does the following.

- Changes the implementations of `all()`, `any()`, `find()` and
  `find_map()` to use the simplest possible code, rather than using
  `try_for_each()`. (I am reminded of "The Evolution of a Haskell
  Programmer".) These are both shorter and faster than the current
  implementations, and will permit the undoing of the `all()` removal in
  rust-lang#64545.

- Changes `ResultShunt::next()` so it doesn't call `self.find()`,
  because that was causing infinite recursion with the new
  implementation of `find()`, which itself calls `self.next()`. (I
  honestly don't know how the old implementation of
  `ResultShunt::next()` didn't cause an infinite loop, given that it
  also called `self.next()`, albeit via `try_for_each()` and
  `try_fold()`.)

- Changes `nth()` to use `self.next()` in a while loop rather than `for
  x in self`, because using self-iteration within an iterator method
  seems dubious, and `self.next()` is used in all the other iterator
  methods.
bors added a commit that referenced this pull request Sep 18, 2019
Simplify some `Iterator` methods.

PR #64545 got a big speed-up by replacing a hot call to `all()` with
explicit iteration. This is because the implementation of `all()` is
excessively complex: it wraps the given predicate in a closure that
returns a `LoopState`, passes that closure to `try_for_each()`, which
wraps the first closure in a second closure, passes that second closure
to `try_fold()`, which does the actual iteration using the second
closure.

A sufficient smart compiler could optimize all this away; rustc is
currently not sufficiently smart.

This commit does the following.

- Changes the implementations of `all()`, `any()`, `find()` and
  `find_map()` to use the simplest possible code, rather than using
  `try_for_each()`. (I am reminded of "The Evolution of a Haskell
  Programmer".) These are both shorter and faster than the current
  implementations, and will permit the undoing of the `all()` removal in
  #64545.

- Changes `ResultShunt::next()` so it doesn't call `self.find()`,
  because that was causing infinite recursion with the new
  implementation of `find()`, which itself calls `self.next()`. (I
  honestly don't know how the old implementation of
  `ResultShunt::next()` didn't cause an infinite loop, given that it
  also called `self.next()`, albeit via `try_for_each()` and
  `try_fold()`.)

- Changes `nth()` to use `self.next()` in a while loop rather than `for
  x in self`, because using self-iteration within an iterator method
  seems dubious, and `self.next()` is used in all the other iterator
  methods.
@nikomatsakis
Copy link
Contributor

r=me with comment, thanks @nnethercote!

@Centril Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2019
Amazingly enough, this is a 3.5% instruction count win on `keccak`.
The super-hot call site of `inlined_shallow_resolve()` basically does
`r.inlined_shallow_resolve(ty) != ty`. This commit introduces a
version of that function specialized for that particular call pattern,
`shallow_resolve_changed()`. Incredibly, this reduces the instruction
count for `keccak` by 5%.

The commit also renames `inlined_shallow_resolve()` as
`shallow_resolve()` and removes the `inline(always)` annotation, as it's
no longer nearly so hot.
@nnethercote
Copy link
Contributor Author

@bors r=nmatsakis

@bors
Copy link
Contributor

bors commented Sep 18, 2019

📌 Commit 3b85597 has been approved by nmatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2019
@nnethercote
Copy link
Contributor Author

Because this has a perf impact, let's do:
@bors rollup-never

@Mark-Simulacrum
Copy link
Member

Since we've already run perf there's probably no reason for that? command is rollup=never btw

@nnethercote
Copy link
Contributor Author

My preference is for perf-affecting PRs to land on their own. If you look at perf.rust-lang.org and see an improvement or regression, and find it's a roll-up, it can be a real pain to work out which PR caused the change.

@bors rollup=never

@nnethercote nnethercote reopened this Sep 18, 2019
@bors
Copy link
Contributor

bors commented Sep 19, 2019

⌛ Testing commit 3b85597 with merge 9b9d2af...

bors added a commit that referenced this pull request Sep 19, 2019
More `ObligationForest` improvements

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis
@bors
Copy link
Contributor

bors commented Sep 19, 2019

☀️ Test successful - checks-azure
Approved by: nmatsakis
Pushing 9b9d2af to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 19, 2019
@bors bors merged commit 3b85597 into rust-lang:master Sep 19, 2019
@nnethercote nnethercote deleted the ObligForest-more branch September 20, 2019 02:55
bors added a commit that referenced this pull request Sep 20, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 20, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
scottmcm added a commit to scottmcm/rust that referenced this pull request Sep 22, 2019
While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
bors added a commit that referenced this pull request Sep 24, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
bors added a commit that referenced this pull request Sep 25, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 30, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants