Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

Merged
merged 1 commit into from
Jan 22, 2019

Conversation

dotdash
Copy link
Contributor

@dotdash dotdash commented Jan 16, 2019

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.

@rust-highfive
Copy link
Collaborator

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 16, 2019
@dotdash
Copy link
Contributor Author

dotdash commented Jan 16, 2019

@bors try

@bors
Copy link
Contributor

bors commented Jan 16, 2019

⌛ Trying commit fafeda6caed392ac71ac7e2b0ccc873ec86e558a with merge 4f2ef2516cd3f1e875bb589402bf1319967ce3a5...

@michaelwoerister
Copy link
Member

Thanks for the PR, @dotdash! Looking forward to the perf results :)

@michaelwoerister
Copy link
Member

@rust-timer build 4f2ef2516cd3f1e875bb589402bf1319967ce3a5
(I think the try build actually finished successfully a while ago)

@rust-timer
Copy link
Collaborator

Success: Queued 4f2ef2516cd3f1e875bb589402bf1319967ce3a5 with parent ceb2512, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 4f2ef2516cd3f1e875bb589402bf1319967ce3a5

@michaelwoerister
Copy link
Member

This looks pretty good. For the synthetic const-eval heavy tests it looks spectacular even :)

I'll give it a proper review tomorrow. Meanwhile, could you check whether there's a problem with compiling the unicode_normalization crate? perf.rlo shows no numbers for it which might indicate a compilation error.

cc @oli-obk @rust-lang/wg-compiler-performance

@dotdash
Copy link
Contributor Author

dotdash commented Jan 17, 2019

The numbers are only missing for the parent commit. I suppose that it just predates the addition of that benchmark, which was yesterday.

let local_ty = self.monomorphize(local_ty, frame.instance.substs);
self.layout_of(local_ty)
) -> TyLayout<'tcx> {
frame.local_layouts[local]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While definitely an improvement for all currently possible const eval code, I think this is suboptimal for code with branching, as a lot of locals will never be touched during evaluation. I think it would be better to make local_layouts be IndexVec<mir::Local, Option<TyLayout<'tcx>>> and only fill it in on demand. To make this really work with reading I presume it would even need to be IndexVec<mir::Local, RefCell<Option<TyLayout<'tcx>>>> at which point we can just move to IndexVec<mir::Local, Once<TyLayout<'tcx>>> or some non-Sync variant of that.

@@ -463,12 +463,10 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
&self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by fix: access_local doesn't need to be pub anymore

@michaelwoerister
Copy link
Member

@oli-obk's remark regarding branching code makes sense. Would it be possible to make some or most EvalContext methods &mut self instead of &self so we don't have to use RefCell? It seems like that should be possible in principle, right?

@oli-obk
Copy link
Contributor

oli-obk commented Jan 18, 2019

Would it be possible to make some or most EvalContext methods &mut self instead of &self so we don't have to use RefCell? It seems like that should be possible in principle, right?

The problem is that we separated reading and writing into &mut self and &self methods. So any method trying to read from the local would need to be &mut self and that would infect everything. The immutability of the read paths has been very helpful in the development of miri. I think it would be fine to have some interior mutability just for memoization. Since TyLayout is Copy, we can even use Cell<Option<TyLayout<'tcx>>> making the entire operation zero cost both in memory and performance.

@michaelwoerister
Copy link
Member

OK, I'm fine with the Cell approach.

I'd be interested though in how using &self has made things easier.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 18, 2019

I'd be interested though in how using &self has made things easier.

Mirroring the mutability of the virtual miri memory in the mutability of the memory datastructure itself allows various things like holding references to different parts of the virtual memory at the same time. If we made everything take mutable references, we could only ever access one miri allocation, even if we just wanted to write to one. While we could have worked around this with unsafe code (and we still have to do it in one situation

// This checks relocation edges on the src.
let src_bytes = self.get(src.alloc_id)?
.get_bytes_with_undef_and_ptr(&tcx, src, size)?
.as_ptr();
let dest_bytes = self.get_mut(dest.alloc_id)?
.get_bytes_mut(&tcx, dest, size * length)?
.as_mut_ptr();
// SAFE: The above indexing would have panicked if there weren't at least `size` bytes
// behind `src` and `dest`. Also, we use the overlapping-safe `ptr::copy` if `src` and
// `dest` could possibly overlap.
// The pointers above remain valid even if the `HashMap` table is moved around because they
// point into the `Vec` storing the bytes.
unsafe {
assert_eq!(size.bytes() as usize as u64, size.bytes());
if src.alloc_id == dest.alloc_id {
if nonoverlapping {
if (src.offset <= dest.offset && src.offset + size > dest.offset) ||
(dest.offset <= src.offset && dest.offset + size > src.offset)
{
return err!(Intrinsic(
"copy_nonoverlapping called on overlapping ranges".to_string(),
));
}
}
for i in 0..length {
ptr::copy(src_bytes,
dest_bytes.offset((size.bytes() * i) as isize),
size.bytes() as usize);
}
} else {
for i in 0..length {
ptr::copy_nonoverlapping(src_bytes,
dest_bytes.offset((size.bytes() * i) as isize),
size.bytes() as usize);
}
}
}
), and while two phase borrows could have made many situations safe, we would still have to fight the borrow checker at ever turn.

Also it just "felt right" to make sure we don't accidentally modify our virtual memory in functions that should just read from it.

@michaelwoerister
Copy link
Member

Yeah, I can see how this can become tricky. Thanks for the explanation, @oli-obk!

@dotdash
Copy link
Contributor Author

dotdash commented Jan 19, 2019

using Cell gets me lifetime errors, which I don't understand right away.

  --> src/librustc_mir/interpret/snapshot.rs:75:34
   |
46 | impl<'a, 'mir, 'tcx> InfiniteLoopDetector<'a, 'mir, 'tcx>
   |      --  ---- lifetime `'mir` defined here
   |      |
   |      lifetime `'a` defined here
...
75 |         if self.snapshots.insert(EvalSnapshot::new(memory, stack)) {
   |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'mir` must outlive `'a`

error: unsatisfied lifetime constraints
   --> src/librustc_mir/interpret/snapshot.rs:410:1
    |
410 |   impl_stable_hash_for!(impl<'tcx, 'b, 'mir> for struct EvalSnapshot<'b, 'mir, 'tcx> {
    |   ^                          ----      ---- lifetime `'mir` defined here
    |   |                          |
    |  _|                          lifetime `'tcx` defined here
    | |
411 | |     // Not hashing memory: Avoid hashing memory all the time during execution
412 | |     memory -> _,
413 | |     stack,
414 | | });
    | |___^ type annotation requires that `'mir` must outlive `'tcx`
    |
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

Is there some magic in the stable hash macro that might affect this?

…k frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.
@dotdash
Copy link
Contributor Author

dotdash commented Jan 20, 2019

Seems that the constraint on the stable hash impl was just swapped around, asking for 'mir to outlive 'tcx instead of the other way around.

@dotdash
Copy link
Contributor Author

dotdash commented Jan 20, 2019

@bors try

We should measure this again, my local setup seems a bit funky right now, and showed a massive slowdown for this version which seems unlikely to be right.

@bors
Copy link
Contributor

bors commented Jan 20, 2019

⌛ Trying commit 98d4f33 with merge a0b24f2d5d611007296d56748caf31cfa8e5af32...

@bors
Copy link
Contributor

bors commented Jan 20, 2019

☀️ Test successful - checks-travis
State: approved= try=True

@oli-obk
Copy link
Contributor

oli-obk commented Jan 21, 2019

@rust-timer build a0b24f2d5d611007296d56748caf31cfa8e5af32

@rust-timer
Copy link
Collaborator

Success: Queued a0b24f2d5d611007296d56748caf31cfa8e5af32 with parent 4db2394, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit a0b24f2d5d611007296d56748caf31cfa8e5af32

@oli-obk
Copy link
Contributor

oli-obk commented Jan 21, 2019

Perf is great (though no effect on unicode-normalization). Impl looks good.

The commit message is still slightly outdated.

@dotdash
Copy link
Contributor Author

dotdash commented Jan 21, 2019

Ah, I had updated the commit message, but not the message for the PR itself. Or did I still miss something in the commit message as well?

@michaelwoerister
Copy link
Member

@bors r+

Thanks, @dotdash!

@bors
Copy link
Contributor

bors commented Jan 21, 2019

📌 Commit 98d4f33 has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 21, 2019
Centril added a commit to Centril/rust that referenced this pull request Jan 21, 2019
const_eval: Predetermine the layout of all locals when pushing a stack frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.
Centril added a commit to Centril/rust that referenced this pull request Jan 22, 2019
const_eval: Predetermine the layout of all locals when pushing a stack frame

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.
bors added a commit that referenced this pull request Jan 22, 2019
Rollup of 9 pull requests

Successful merges:

 - #57537 (Small perf improvement for fmt)
 - #57552 (Default images)
 - #57604 (Make `str` indexing generic on `SliceIndex`.)
 - #57667 (Fix memory leak in P::filter_map)
 - #57677 (const_eval: Predetermine the layout of all locals when pushing a stack frame)
 - #57791 (Add regression test for #54582)
 - #57798 (Corrected spelling inconsistency)
 - #57809 (Add powerpc64-unknown-freebsd)
 - #57813 (fix validation range printing when encountering undef)

Failed merges:

r? @ghost
@bors bors merged commit 98d4f33 into rust-lang:master Jan 22, 2019
@@ -510,7 +504,7 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
// FIXME: do some more logic on `move` to invalidate the old location
Copy(ref place) |
Move(ref place) =>
self.eval_place_to_op(place, layout)?,
self.eval_place_to_op(place)?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but this means we might have to compute the layout even if we already know it, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To change this we'd need to pass a precomputed layout to layout_of_local, which probably doesn't do anything, because any initialized local will already have had called layout_of_local on it before. We can still benchmark it and see if it changes anything

@@ -72,6 +72,7 @@ fn mk_eval_cx_inner<'a, 'mir, 'tcx>(
ecx.stack.push(interpret::Frame {
block: mir::START_BLOCK,
locals: IndexVec::new(),
local_layouts: IndexVec::new(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the information be included in locals itself?

@dotdash dotdash deleted the locals branch January 20, 2020 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants