Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change HirVec<P<T>> to HirVec<T> in hir:: Expr. #37642

Merged
merged 1 commit into from
Nov 22, 2016

Conversation

nnethercote
Copy link
Contributor

This PR changes data structures like this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]

to this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]

I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by -Ztime-passes and by /usr/bin/time increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. HirVec<Expr> is a typedef for P<[Expr]>. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in HirVec<Expr> than in HirVec<P<Expr>>. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from lower_expr:

        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }

I suspect what is happening is that temporary excess elements are created within the collect calls. The workload from #36799 has many 2-tuples and 3-tuples and when Vec gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, Vec::with_capacity doesn't create excess AFAICT. So I'm not sure. What goes on inside collect is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.

@rust-highfive
Copy link
Collaborator

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister
Copy link
Member

Interesting. Thanks for posting, @nnethercote.

@bluss
Copy link
Member

bluss commented Nov 8, 2016

Vec's FromIterator

  • Uses only a single allocation up front, if the iterator reports a lower bound in Iterator::size_hint that fits at least all the elements in the iterator
  • Uses a more efficient loop with no capacity check in the body, if the iterator is a TrustedLen iterator.

Since both of those lines use the slice iterator and then just .map() they should already be "perfect" (which never should be taken for granted, I guess).

@bors
Copy link
Contributor

bors commented Nov 10, 2016

☔ The latest upstream changes (presumably #37678) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb
Copy link
Member

eddyb commented Nov 10, 2016

IMO there isn't a lot we can do that's not a compromise (due to heap pressure dynamics) without moving to arena/SoA models for storing the HIR in the first place (replacing P, at least at that level).

@michaelwoerister
Copy link
Member

What's "SoA"?

@eddyb
Copy link
Member

eddyb commented Nov 10, 2016

"Struct of Arrays", where each type has its own vector for all instances of that type, in a central struct, although it's usually used for fields, not nodes.
We do something similar with our side-tables, this would just be flatter. I'm still a bit torn between this and huge arenas, but the latter are somewhat more ergonomic.

@alexcrichton alexcrichton added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 10, 2016
@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 11, 2016
@nnethercote
Copy link
Contributor Author

I just rebased and remeasured and this change is now a small win for a cut-down version of the workload from #36799, reducing peak RSS from 866 MiB to 848 MiB. I'm not sure what changed, other than #37764 landing, though there shouldn't really be any interaction between that PR and this one.

r? @michaelwoerister

@nnethercote
Copy link
Contributor Author

Some heap allocation numbers from DHAT.

old

tot_alloc:    5,980,761,512 in 27,087,211 blocks
max_live:     838,850,202 in 3,978,848 blocks

new

tot_alloc:    5,971,161,300 in 25,887,185 blocks
max_live:     829,249,992 in 2,778,822 blocks

This PR also speeds up compilation of that workload by about 4%. It also speeds up a couple of the rustc-benchmarks by 1--2%.

@bors
Copy link
Contributor

bors commented Nov 21, 2016

☔ The latest upstream changes (presumably #37824) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member

Looks good to me. Happy to approve it after a rebase.

@eddyb
Copy link
Member

eddyb commented Nov 21, 2016

If it's measurably better I have no real concerns - I imagine it'd eventually be &[&Expr] vs &[Expr], unless we end up with some sort of ID linked list to help with incremental recompilation or something.

This changes structures like this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]
```
@nnethercote
Copy link
Contributor Author

I rebased. r? @michaelwoerister

@michaelwoerister
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Nov 21, 2016

📌 Commit 382d3b0 has been approved by michaelwoerister

@bors
Copy link
Contributor

bors commented Nov 21, 2016

⌛ Testing commit 382d3b0 with merge 82d8833...

bors added a commit that referenced this pull request Nov 21, 2016
Change HirVec<P<T>> to HirVec<T> in hir:: Expr.

This PR changes data structures like this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]
```
I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by `-Ztime-passes` and by `/usr/bin/time` increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. `HirVec<Expr>` is a typedef for `P<[Expr]>`. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in `HirVec<Expr>` than in `HirVec<P<Expr>>`. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from `lower_expr`:
```rust
        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }
```
I suspect what is happening is that temporary excess elements are created within the `collect` calls. The workload from #36799 has many 2-tuples and 3-tuples and when `Vec` gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, `Vec::with_capacity` doesn't create excess AFAICT. So I'm not sure. What goes on inside `collect` is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.
@nnethercote nnethercote deleted the no-HirVec-of-P branch November 22, 2016 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants