Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of lateout pairs in inline asm #101346

Open
newpavlov opened this issue Sep 2, 2022 · 21 comments
Open

Incorrect handling of lateout pairs in inline asm #101346

newpavlov opened this issue Sep 2, 2022 · 21 comments
Assignees
Labels
A-inline-assembly Area: inline asm!(..) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-none Indicates that no working group is assigned to an issue, but it *does* have an active owner

Comments

@newpavlov
Copy link
Contributor

newpavlov commented Sep 2, 2022

(issue is loosely owned, in terms of P-high tracking, by @wesleywiser and @pnkfelix keeping their eyes on llvm/llvm-project#57550)

Updated bug description

The following Rust function:

pub fn foo() -> u32 {
    let t1: u32;
    let t2: u32;
    unsafe {
        asm!(
            "mov {0:e}, 1",
            "mov eax, 42",
            lateout(reg) t1,
            lateout("eax") t2,
            options(nostack),
        );
    }
    t1
}

Gets compiled into this obviously incorrect assembly:

example::foo:
        mov     eax, 1
        mov     eax, 42
        ret

Godbolt link: https://rust.godbolt.org/z/Yb9v7WobM

LLVM incorrectly reuses register for a pair of lateouts if it can see that one of those does not get used later.

Original description below

We get spurious segfaults in the chacha20 crate when we run tests compiled for i686 target (i686-unknown-linux-gnu to be exact), see RustCrypto/stream-ciphers#304 for more information. Interestingly enough, Rust 1.56 does not have this issue, only 1.57 and later. Changes in the cpufeatures crate which revealed the issue look quite innocent. The issue also disappears if zeroize feature is disabled, which is quite tangential to cpufeatures (it only adds Drop impls). Even weirder, @aumetra's findings show that segfault happens at the following line:

let mut buf = [0u8; MAX_SEEK];

Granted, the chacha20 crate contains a fair bit of unsafe code, as well as its dependencies, so the issue may be caused by unsoundness somewhere in our crates. But the circumstantial evidence makes a potential compiler bug quite probable.

@talchas
Copy link

talchas commented Sep 3, 2022

Yeah, the 32-bit cpuid implementation at https://doc.rust-lang.org/src/core/up/up/stdarch/crates/core_arch/src/x86/cpuid.rs.html#62 is compiling to something wrong. This is compiling to

   0x565ba631 <+193>:   mov    %ebx,%eax
   0x565ba633 <+195>:   cpuid  
   0x565ba635 <+197>:   xchg   %eax,%ebx

on rustc 1.63.0 (4b91a6ea7 2022-08-08)
which of course does actually clobber %ebx, because apparently a lateout register can overlap an explicitly-chosen inlateout register. I don't know if that's an intentional LLVM thing, an LLVM bug, or what. According to godbolt this asm!() results in

%0 = tail call { i32, i32, i32, i32 } asm sideeffect "movl %ebx, ${0:k}\0Acpuid\0Axchgl %ebx, ${0:k}", "=r,={ax},={cx},={dx},1,2,~{memory}"(i32 1, i32 1) #1, !dbg !10, !srcloc !16

@newpavlov
Copy link
Contributor Author

newpavlov commented Sep 3, 2022

Yeah, after your post in the RustCrypto issue I also tried it in godbolt and got the same weird assembly: https://rust.godbolt.org/z/4z9nrY1Eh

Rust:

use std::arch::x86::__cpuid;

pub unsafe fn foo() -> u32 {
    // get manufacture ID
    let res = __cpuid(0);
    res.ebx
}

Generated assembly which is obviously wrong:

example::foo:
        xor     eax, eax ; set EAX to zero
        xor     ecx, ecx ; set ECX to zero
        mov     eax, ebx ; "cache" calee-saved EBX to EAX
        cpuid ; WHOOPS not only we now use EAX which contains EBX's value,
              ; but also CPUID overwrites EAX, making the caching useless
        xchg    eax, ebx ; move EBX into EAX as return result
        ret

Bad news is that on x68-64 we get the same wrong assembly:

example::foo:
        xor     eax, eax
        xor     ecx, ecx
        mov     rax, rbx
        cpuid
        xchg    rax, rbx
        ret

I think correct codegen should cache EBX to stack. Honestly, I am amazed that this issue has not surfaced earlier.

If foo returns res.eax (or any other register) instead of res.ebx codegen works correctly:

example::foo:
        push    esi
        xor     eax, eax
        xor     ecx, ecx
        mov     esi, ebx
        cpuid
        xchg    esi, ebx
        pop     esi
        ret

@talchas
Copy link

talchas commented Sep 3, 2022

It's supposed to pick a different register and save it there, and that both should work (with proper clobbers) and should be better than pushing to the stack. Changing it to out(reg) ebx, works in godbolt, though I can't swear that it is guaranteed as opposed to just avoiding this example. out(reg) ebx, lateout("eax") eax, does still work, so it presumably still won't be able to conflict with the non-input edx. I'm still not sure if this is an LLVM bug or intentionally part of the rules of inline asm (constraint rules have never been very clear).

@talchas
Copy link

talchas commented Sep 3, 2022

https://rust.godbolt.org/z/Gr1ve77a6 suggests that it might be that LLVM considers unused lateout arguments to be fair game to delete along with the clobber.

@newpavlov
Copy link
Contributor Author

newpavlov commented Sep 3, 2022

I think the issue is with incorrect use of lateout(reg) ebx, together with inlateout("eax") leaf => eax,. Because of the late specifiers LLVM assumes that it's legal to use the same register for both of them (i.e. it assigns {0} to eax).

This code produces incorrect result: https://rust.godbolt.org/z/j9v6v69Pz

But by changing lateout(reg) ebx, to out(reg) ebx, we get correct assembly: https://rust.godbolt.org/z/nWEEa9YsE

@talchas
Copy link

talchas commented Sep 3, 2022

Yeah, the thing is that that doesn't really make sense in general - lateout(reg) ebx, lateout("eax") eax (which still fails) really shouldn't be able to assign them to be the same register any more than it could for lateout(reg) x, lateout(reg) y.

@newpavlov
Copy link
Contributor Author

newpavlov commented Sep 3, 2022

Indeed, LLVM does something weird with lateout registers when one of them does not get used.

https://rust.godbolt.org/z/3e6MzzWfa

@talchas
Copy link

talchas commented Sep 3, 2022

That one is probably just pop arbitrary_register - it needs to have push/poped something around the asm for stack alignment reasons since it isn't marked nostack.

@newpavlov
Copy link
Contributor Author

newpavlov commented Sep 3, 2022

You are right, options(nostack) removes the weird pop/push.

A better example, which probably can be used as a minified demonstration of the issue: https://rust.godbolt.org/z/3xMenGjMe

pub fn foo() -> u32 {
    let t1: u32;
    let t2: u32;
    unsafe {
        asm!(
            "mov {0}, 1",
            "mov eax, 42",
            lateout(reg) t1,
            lateout("eax") t2, // `t2` can be replaced by `_`
            options(nostack),
        );
    }
    t1
}
example::foo:
        mov     rax, 1
        mov     eax, 42
        ret

@newpavlov
Copy link
Contributor Author

newpavlov commented Sep 3, 2022

Either way, the late specifiers are useless in the case of CPUID intrinsics and I think they should be simply removed. We only need one register to cache EBX and return result stored in it, which is not EAX, EBX, ECX, or EDX. There are no opportunities for register reuse.

@comex
Copy link
Contributor

comex commented Sep 3, 2022

For what it's worth, the original asm! block seems to me like it would misbehave if {0} happened to be assigned as ebx itself. Why does it do this backup and restore rather than just specifying ebx as a clobber a constraint for the output?

But that's not what's happening here, so it does seem like LLVM is miscompiling the code.

@talchas
Copy link

talchas commented Sep 3, 2022

You can't specify ebx as a constraint because LLVM uses it internally for something (maybe exactly this plt cache? I forget) and doesn't handle clobbers of it like a sane compiler (iirc you get compile errors sometimes).

@talchas
Copy link

talchas commented Sep 3, 2022

That said, I'm not sure that that's true anymore - while I vaguely recall some sort of error along those lines, I can't trigger one from the PLT usage of ebx. #90083 doesn't give details and I can't find them quickly re LLVM/clang.

Oh it's 64-bit that uses rbx and 32-bit uses esi, so the 32-bit cpuid could just do that.

@newpavlov
Copy link
Contributor Author

newpavlov commented Sep 3, 2022

Trying to clobber rbx in the usual way currently results in the following compilation error:

error: cannot use register `bx`: rbx is used internally by LLVM and cannot be used as an operand for inline asm

Honestly, it's quite annoying and I wish we did not have such exception.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-inline-assembly Area: inline asm!(..) I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Sep 3, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Sep 3, 2022
@apiraino
Copy link
Contributor

apiraino commented Sep 8, 2022

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-critical T-compiler

@rustbot rustbot added P-critical Critical priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 8, 2022
@pnkfelix
Copy link
Member

pnkfelix commented Sep 8, 2022

Does the change that landed in rust-lang/stdarch#1329 resolve this? (In other words, do we just need to pull in those changes in some manner to resolve this issue?)

Or is there something else that will need to happen?

@pnkfelix
Copy link
Member

pnkfelix commented Sep 8, 2022

also, given that this is a P-critical issue that affects 1.57 and later: Should we be talking about beta backports of the changes associated with fixing this?

@talchas
Copy link

talchas commented Sep 8, 2022

That change should resolve this, yes.

@Amanieu
Copy link
Member

Amanieu commented Sep 9, 2022

The issue affects all uses of asm! where lateout operands are used. As far as I can tell cpuid is the only one that is affected in the standard library, but uses of asm! in other crates could also be affected by this bug.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 17, 2022
Update stdarch

This pulls in the following changes:

- [Use simd_bitmask intrinsic in a couple of places](rust-lang/stdarch@9f09287)
- [Remove simd_shuffle<n> usage in favor of simd_shuffle](rust-lang/stdarch@3fd17e4)
- [Remove late specifiers in __cpuid_count](rust-lang/stdarch@f1db941)
  - Helps with rust-lang#101346
- [Use mov and xchg instead of movl(q) and xchgl(q)](rust-lang/stdarch@3049a31)
- [Bump cfg-if dependency to 1.0](rust-lang/stdarch@f305cc8)
- [Fix documentation of __m256bh and __m512bh structs](rust-lang/stdarch@699c093)

r? `@Amanieu`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 17, 2022
Update stdarch

This pulls in the following changes:

- [Use simd_bitmask intrinsic in a couple of places](rust-lang/stdarch@9f09287)
- [Remove simd_shuffle<n> usage in favor of simd_shuffle](rust-lang/stdarch@3fd17e4)
- [Remove late specifiers in __cpuid_count](rust-lang/stdarch@f1db941)
  - Helps with rust-lang#101346
- [Use mov and xchg instead of movl(q) and xchgl(q)](rust-lang/stdarch@3049a31)
- [Bump cfg-if dependency to 1.0](rust-lang/stdarch@f305cc8)
- [Fix documentation of __m256bh and __m512bh structs](rust-lang/stdarch@699c093)

r? ``@Amanieu``
@apiraino
Copy link
Contributor

Issue was visited during T-compiler meeting on Zulip (notes). Demoting priority to P-high since #101861 reduced the impact and the LLVM upstream fix seems to be formalized

@rustbot label -P-critical P-high

@rustbot rustbot added P-high High priority and removed P-critical Critical priority labels Sep 22, 2022
@pnkfelix pnkfelix added the WG-none Indicates that no working group is assigned to an issue, but it *does* have an active owner label Sep 30, 2022
@pnkfelix
Copy link
Member

T-compiler P-high review. Nikita volunteered to be default assignee for LLVM miscompilations.

@rustbot assign @nikic

@newpavlov newpavlov changed the title Potential miscompilation on i686 of chacha20 Incorrect handling of lateout pairs in inline asm Nov 24, 2022
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 5, 2022
deps: update cpufeatures, swap difference to dissimilar

Updating cpufeatures v0.2.1 -> v0.2.5: https://github.com/RustCrypto/utils/blob/master/cpufeatures/CHANGELOG.md#025-2022-09-04, was yanked bc of miscompile (RustCrypto/utils#800, rust-lang#101346)

Removing difference v2.0.0
     Adding dissimilar v1.0.4
   Updating expect-test v1.0.1 -> v1.4.0

difference unmaintened https://rustsec.org/advisories/RUSTSEC-2020-0095.html, so replaced with https://github.com/dtolnay/dissimilar (as dependency of `expect-test`)
RalfJung pushed a commit to RalfJung/miri that referenced this issue Dec 5, 2022
deps: update cpufeatures, swap difference to dissimilar

Updating cpufeatures v0.2.1 -> v0.2.5: https://github.com/RustCrypto/utils/blob/master/cpufeatures/CHANGELOG.md#025-2022-09-04, was yanked bc of miscompile (RustCrypto/utils#800, rust-lang/rust#101346)

Removing difference v2.0.0
     Adding dissimilar v1.0.4
   Updating expect-test v1.0.1 -> v1.4.0

difference unmaintened https://rustsec.org/advisories/RUSTSEC-2020-0095.html, so replaced with https://github.com/dtolnay/dissimilar (as dependency of `expect-test`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-inline-assembly Area: inline asm!(..) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-none Indicates that no working group is assigned to an issue, but it *does* have an active owner
Projects
None yet
Development

No branches or pull requests

8 participants