Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary memcpy when returning a struct #57077

Closed
MSxDOS opened this issue Dec 23, 2018 · 14 comments
Closed

Unnecessary memcpy when returning a struct #57077

MSxDOS opened this issue Dec 23, 2018 · 14 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-inlining Area: MIR inlining A-mir-opt-nrvo Fixed by NRVO I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@MSxDOS
Copy link

MSxDOS commented Dec 23, 2018

Returning a newly created struct from a function causes an unnecessary memcpy even if the function is inlined:

use std::mem::uninitialized;

#[repr(C)]
pub struct SomeStruct(u64, u32, u16, u8, usize, u64, u64, u32, usize, u16, u32, usize, isize);

extern {
    pub fn SomeExternFun(val: *mut SomeStruct);
}

#[inline(always)]
unsafe fn ret_val() -> SomeStruct {
    let mut val: SomeStruct = uninitialized();
    SomeExternFun(&mut val);
    val
}

#[inline(always)]
unsafe fn by_ref(val: &mut SomeStruct) {
    SomeExternFun(val);
}

pub unsafe fn call_ret_val() {
     let val = ret_val();
     println!("{}", val.0);
}

pub unsafe fn call_by_ref() {
    let mut val: SomeStruct = uninitialized();
    by_ref(&mut val);
    println!("{}", val.0);
}
playground::call_ret_val: # @playground::call_ret_val
# %bb.0:
	sub	rsp, 184
	lea	rdi, [rsp + 16]
	call	qword ptr [rip + SomeExternFun@GOTPCREL]
	movups	xmm0, xmmword ptr [rsp + 80]
	movaps	xmmword ptr [rsp + 160], xmm0
	movups	xmm0, xmmword ptr [rsp + 16]
	movups	xmm1, xmmword ptr [rsp + 32]
	movups	xmm2, xmmword ptr [rsp + 48]
	movups	xmm3, xmmword ptr [rsp + 64]
	movaps	xmmword ptr [rsp + 144], xmm3
	movaps	xmmword ptr [rsp + 128], xmm2
	movaps	xmmword ptr [rsp + 112], xmm1
	movaps	xmmword ptr [rsp + 96], xmm0
	lea	rax, [rsp + 96]
	mov	qword ptr [rsp], rax
	mov	rax, qword ptr [rip + core::fmt::num::<impl core::fmt::Display for u64>::fmt@GOTPCREL]
	mov	qword ptr [rsp + 8], rax
	lea	rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.2]
	mov	qword ptr [rsp + 16], rax
	mov	qword ptr [rsp + 24], 2
	lea	rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.3]
	mov	qword ptr [rsp + 32], rax
	mov	qword ptr [rsp + 40], 1
	mov	rax, rsp
	mov	qword ptr [rsp + 48], rax
	mov	qword ptr [rsp + 56], 1
	lea	rdi, [rsp + 16]
	call	qword ptr [rip + std::io::stdio::_print@GOTPCREL]
	add	rsp, 184
	ret
                                        # -- End function

playground::call_by_ref: # @playground::call_by_ref
# %bb.0:
	push	rbx
	sub	rsp, 144
	lea	rbx, [rsp + 64]
	mov	rdi, rbx
	call	qword ptr [rip + SomeExternFun@GOTPCREL]
	mov	qword ptr [rsp], rbx
	mov	rax, qword ptr [rip + core::fmt::num::<impl core::fmt::Display for u64>::fmt@GOTPCREL]
	mov	qword ptr [rsp + 8], rax
	lea	rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.2]
	mov	qword ptr [rsp + 16], rax
	mov	qword ptr [rsp + 24], 2
	lea	rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.3]
	mov	qword ptr [rsp + 32], rax
	mov	qword ptr [rsp + 40], 1
	mov	rax, rsp
	mov	qword ptr [rsp + 48], rax
	mov	qword ptr [rsp + 56], 1
	lea	rdi, [rsp + 16]
	call	qword ptr [rip + std::io::stdio::_print@GOTPCREL]
	add	rsp, 144
	pop	rbx
	ret
                                        # -- End function

Playground

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Dec 23, 2018
@nikic
Copy link
Contributor

nikic commented Dec 23, 2018

Removing this kind of memcpy is generally the job of call slot optimization. I believe in this case it will fail because SomeExternFun() may capture the pointer.

Not totally sure whether that restriction needs to apply if the destination is an alloca, as we're just changing one alloca for another.

@nikic
Copy link
Contributor

nikic commented Dec 23, 2018

Yeah, I think that as-is LLVM is not permitted to optimize away that memcpy. SomeExternFun() func may capture the pointer and println captures the pointer and together they could observe whether or not that memcpy has been optimized away.

UItimately the really issue here is the println, which captures the pointer and prevents SROA from working. See #50519 (comment). If you write println!("{}", {val.0}); instead it optimizes properly.

@MSxDOS
Copy link
Author

MSxDOS commented Dec 23, 2018

If you write println!("{}", {val.0}); instead it optimizes properly.

Unfortunately, this is not a solution when we need to pass a pointer again - for example, replacing println with another call to SomeExternFun brings memcpy back.

@nikic
Copy link
Contributor

nikic commented Dec 23, 2018

@MSxDOS I'm not totally certain, but I think it's not legal to optimize the memcpy away in this case. Assuming you have code like this:

unsafe fn ret_val() -> SomeStruct {
    let mut val: SomeStruct = uninitialized();
    SomeExternFun(&mut val);
    val
}

pub unsafe fn call_ret_val() {
     let val = ret_val();
     SomeExternFun2(&val);
}

Then SomeExternFun() could store the pointer passed to it and SomeExternFun2() could store the pointer passed to it and then compare the two pointers. With the memcpy the pointers will differ, without it they will be the same, changing observable behavior.

Maybe someone (@rkruppe or @RalfJung?) could comment on whether the result of that pointer comparison might be unspecified behavior? I'm pretty sure it's well defined under LLVM semantics (making elision of this memcpy illegal for LLVM), but possibly in rustc we could have relaxed rules (something akin to RVO)?

@RalfJung
Copy link
Member

The pointer passed to SomeExternFun is dead and dangling by the time it gets compared, so it doesn't seem unreasonable to say that a new, live pointer could re-use the same address.

Cc @eddyb

@WiSaGaN
Copy link
Contributor

WiSaGaN commented Dec 24, 2018

Is unsafe operation a necessary condition to reproduce this?

@MSxDOS
Copy link
Author

MSxDOS commented May 1, 2019

I think this is directly related to the unnecessary memcpy when using Box::new() (#49733 (comment))

You can do

unsafe {
    let mut fast_box: Box<T> = Box::new(std::mem::uninitialized());
    (fast_box.as_mut() as *mut T).write(/* stuff */);
}

to write directly to the allocation as when using box syntax, but if you try to wrap it in a fuction and return the box, the memcpy is back 😠 -> Playground

@eddyb
Copy link
Member

eddyb commented May 1, 2019

This will probably be solved by #47954 (which will hopefully get revived in the next few months).
That's assuming there's a copy/move in the MIR, that we can get rid of.

@jonas-schievink jonas-schievink added A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by NRVO T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 14, 2020
@MSxDOS
Copy link
Author

MSxDOS commented May 22, 2020

The first example was fixed by #72205 🎉

#57077 (comment) - this still copies stuff.

https://rust.godbolt.org/z/PGCmf7 - here call_eat_guid_val initializes the struct in the stack while call_eat_guid_ref just references it from a read-only section.

@camelid camelid added the A-mir-opt-inlining Area: MIR inlining label Oct 30, 2020
@MSxDOS
Copy link
Author

MSxDOS commented Mar 12, 2021

#57077 (comment) - this still copies stuff.

Fixed on the latest nightly, presumably by #82806

@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

Any cases left that still don't optimize, or can we close this?

@MSxDOS
Copy link
Author

MSxDOS commented Mar 12, 2021

https://rust.godbolt.org/z/PGCmf7 - here call_eat_guid_val initializes the struct in the stack while call_eat_guid_ref just references it from a read-only section.

This one is still there, but I'm no longer sure it's even related. If not then we can happily close this.

@nikic
Copy link
Contributor

nikic commented Mar 13, 2021

@MSxDOS Don't think that one is really related. A slightly better example would be https://rust.godbolt.org/z/rxKG4c which uses & instead of *const, as the former guarantees that the argument is readonly. Given that, it would be possible to realize that the locally initialized value could be moved into a constant global and that global passed (though I'm not entirely sue this is legal without nocapture).

@MSxDOS
Copy link
Author

MSxDOS commented Mar 13, 2021

The example was based on code that's used to feed GUIDs to extern functions in winapi crate, so & wouldn't work there.

Closing this as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-inlining Area: MIR inlining A-mir-opt-nrvo Fixed by NRVO I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants