Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The extern "C" ABI of vector types depends on target features #116558

Open
Tracked by #69098
RalfJung opened this issue Oct 9, 2023 · 35 comments · May be fixed by #127731
Open
Tracked by #69098

The extern "C" ABI of vector types depends on target features #116558

RalfJung opened this issue Oct 9, 2023 · 35 comments · May be fixed by #127731
Labels
A-abi Area: Concerning the application binary interface (ABI). A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Oct 9, 2023

The following program has UB, for very surprising reasons:

use std::mem::transmute;
#[cfg(target_arch = "x86")]
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

extern "C" fn no_target_feature(_dummy: f32, x: __m256) {
    let val = unsafe { transmute::<_, [u32; 8]>(x) };
    dbg!(val);
}

#[target_feature(enable = "avx")]
unsafe fn with_target_feature(x: __m256) {
  // Here we're seemingly just calling a safe function... but actually this call is UB!
  // Caller and callee do not agree on the ABI; specifically they disagree on how the
  // `__m256` argument gets passed.
  no_target_feature(0.0, x);
}

fn main() {
    assert!(is_x86_feature_detected!("avx"));
    // SAFETY: we checked that the `avx` feature is present.
    unsafe {
        with_target_feature(transmute([1; 8]));
    }
}

The reason is that the ABI of the __m256 type depends on the set of target features, so the caller (with_target_feature) and callee (no_target_feature) do not agree on how the argument should be passed. The result is a vector half-filled with junk. (The same issue also arises in the other direction, where the caller has fewer features than the callee: example here.)

I'm not sure if this is currently even properly documented? We are mentioning it in #115476 but the program above doesn't involve any function pointers so we really cannot expect people to be aware of that part of the docs. We show a general warning about the type not being FFI-compatible, but that warning shows up a lot and anyway in this case both caller and callee are Rust functions!

I think we need to do better here, but backwards compatibility might make that hard. @chorman0773 suggested we should just reject functions like no_target_feature that take an AVX type by-val without having declared the AVX feature. That seems reasonable; a crater run would be needed to assess whether it breaks too much code. An alternative might be to have a deny-by-default lint that very clearly explains what is happening.

There are also details to work out wrt what exactly the lint should check. Newtypes around __m256 will have the same problem. What about other larger types that contain __m256? Behind a ptr indirection it's obviously fine, but what about (__m256, __m256)? If we apply the ScalarPair optimization this will be passed in registers even on x86.

A possible place to put the check could be somewhere around here.

In terms of process, I am not sure if an RFC is required; a t-compiler MCP might be sufficient. Currently we accept code that clearly doesn't do what it looks like it should do.

And finally -- are there any other targets (besides x86 and x86-64) that have target-features that affect the ABI? They should get the same treatment.

Note that this is different from #116344 in two ways:

  • This issue only affects non-"Rust" ABIs; the other issue affects even the "Rust" ABI. (The Rust ABI passes __m256 by-pointer exactly to work-around this issue.)
  • This issue comes about when the user enables extra features (which is possible also locally via #[target_feature]), the other issue comes about when the user disables features (which is only possible on a per-crate level via -C, and if you're mixing crates with different -C flags then you're already already on very shaky grounds -- we do that with std but nobody else really gets to do that, I think).

Cc @workingjubilee more ABI fun ;)

@RalfJung RalfJung added the A-abi Area: Concerning the application binary interface (ABI). label Oct 9, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 9, 2023
@chorman0773
Copy link
Contributor

For a hard error, I think it probably should go through RFC. I can start drafting that now. As for what it would check, I'd expect a hard error on any time a simd type is passed by value and the appropriate feature is unavailable, even if the specific ABI wouldn't necessarily pass it in registers, including inside a structure, and the lint should probably check the same thing.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

That can have potentially surprising effects, e.g. when a closure captures something of type __m256 and the closure gets passed by-value to a generic function.

It also means adding adding a private field that contains by-val repr(simd) data can be a semver-breaking change. Cc @ehuss

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 9, 2023

It also means adding adding a private field that contains by-val repr(simd) data can be a semver-breaking change

This would already be the case with replacing several fields with a SIMD type. For example, changing struct Foo(u64,u64,u64,u64); to struct Foo(__m256i);

Although thinking about it, because features can be per-crate, this might be a problem on crates that specialize based on target features.

Perhaps #[repr(Rust)] types should be exempted and always passed indirectly when they contain SIMD types (but not #[repr(C)] types).

@coastalwhite
Copy link
Contributor

With RISC-V, you have Zfinx and Zdinx which have hard-floats but have moved the floats to the normal general purpose registers. In this case, forbidding the passing of floats by-value, if the f (RISC-V's version of single-precision float) target is not present, is very restrictive and probably not what you would want. Otherwise, allowing the passing-by-value of floats between f and Zfinx is also UB.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

But the ABI only changes when f is present or not present, I assume? When f is present, Zfinx just means more instructions can be used but the ABI still says f32 is passed on float registers? And when f is not present, softfloat and Zfinx use the same ABI?

So that sounds like whether or not f is present should be a different target triple. Code with vs without f simply cannot be linked together. But having or not having Zfinx doesn't make an ABI difference. Or am I misunderstanding?

@coastalwhite
Copy link
Contributor

No, that is exactly it. Do I understand correctly that you essentially want to do something similar to thumb with the eabi and eabihf?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

I'm not an expert on target details. 🙈 Usually my purview ends when I have given precise abstract semantics to nasty pieces of unsafe Rust and MIR.^^ But every now and then terrible low-level CPU nightmares creep onto my territory and then I have to learn how to deal with them...

So, I don't know what ARM does with eabi vs eabihf. But it sounds like they are considering "hard float" and "soft float" two different ABIs that presumably use different registers for argument passing (so they are indeed different ABIs), and then have corresponding target triples to tell apart the ABIs. So yeah that sounds like what I am suggesting we should do for other targets as well. (We could also consider different ABI strings, like "C" vs "C-softfloat" or so. No idea if that makes any sense.)

Obviously we don't actually care whether the functions behind those ABIs use x87/d/f hardware instructions or do some softfloat stuff, we just care how arguments are passed. But sadly it seems that in LLVM, there's no way to say "you have the d,f target features available but please use the softfloat ABI anyway"? If that is something people need (enable d,f or x87,sse,sse2 on softfloat targets without affecting ABI) we should start talking with LLVM people about whether and how that could be achieved.

This may all be completely naive since I'm just coming at this with my principled approach of making everything safe or at least sound and giving it semantics at a high level; I rely on people like @chorman0773 to tell me when my ideas make no sense "in the field". ;)

@Skgland
Copy link
Contributor

Skgland commented Oct 9, 2023

If with_target_feature calling no_target_feature is UB due do different target features, shouldn't main calling with_target_feature have the same problem? just the other way around?
Or is this specific to "C" being a non Rust-ABI?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

Yeah for the Rust ABI we do apply a work-around (it always passes such types in memory), so that call is fine. But the combination of extern fn and target_feature is a big minefield currently.

(I updated the example to actually use the Rust ABI for with_target_feature.)

@Skgland
Copy link
Contributor

Skgland commented Oct 9, 2023

Would adding a inline(never) Rust ABI intermediate function be a valid workaround to prevent UB here?
Unconditional code generation #[target_feature]: Note 1 making #[inline(never)] required as otherwise the compiler may inline the function and then we would be back to the incorrect feature set.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

Inlining should be aware of target features, so inline(never) should not be needed. If it is needed that's a critical bug in the inliner that we should track and fix.

Yes such an intermediate Rust ABI function works around the problem successfully. But the fundamental problem remains with function pointers: we have to decide whether an extern "C" fn(__m256) has the AVX ABI or some fallback ABI, without having any clue where and how the function it points to was declared. We probably want it to have the AVX ABI since people that use AVX and interface between Rust and C code likely will have their C code built in a way that it uses the AVX ABI. This means all extern "C" fn declared in Rust that take __m256 as arguments should do this the AVX style, no matter the set of enabled target features -- or they should fail to build.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

Ah, the inline actually does make a difference here... well that's bad, I opened #116573 to track that.

@briansmith
Copy link
Contributor

The compiler does report improper_ctypes_definitions warnings for the given code so I guess it isn't unexpected? It would e better to have an example that doesn't trigger improper_ctypes_definitions.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 9, 2023

The compiler does report improper_ctypes_definitions warnings for the given code so I guess it isn't unexpected? It would e better to have an example that doesn't trigger improper_ctypes_definitions.

And that's supposed to be just a warning. We should hard error if we know everything is fucked and there's no salvaging the compilation (whether hard erroring is appropriate here, I don't know yet).

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2023

There's no C code involved here so I would argue it is reasonable to silence the lint for the example. The lint talks about FFI, but the example does no FFI, this is all Rust-to-Rust calls.

@briansmith
Copy link
Contributor

Is there an example that can trigger this with only FFI-safe types though?

@chorman0773
Copy link
Contributor

I've actually suppressed that lint for __m128 on sse3 only code (so xmm registers 100% exist) that was experiencing a perf regression when the function used default ABI.
But also I wouldn't trust improper_ctypes{,_definitions} to not get blanket suppressed in FFI heavy code, so at the very least a lint for this should be it's own thing. improper_ctypes is a clippy lint in rustc.

@tmandry
Copy link
Member

tmandry commented Oct 9, 2023

I don't think #[target_feature] should ever affect the ABI of function pointers called from within them, especially not extern "C", ones where the C ABI is defined statically at the TU level.

If we did anything with ABI based on #[target_feature], it would have to be encoded in the type of the function pointers somehow. So for example, an extern "C" function with an avx calling convention would really be extern "C+avx" or similar.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 10, 2023

@tmandry yeah agreed. Sadly target_feature 1.0 was stabilized in a form where #[target_feature] does affect ABI, and core::arch SIMD types were stabilized in a form where -C target-feature affects ABI. That's causing this issue here and also #116344.

@RalfJung
Copy link
Member Author

@briansmith our ctypes lints seem to have enough gaps to be easily circumvented; here's a reproducer that the lint doesn't catch. It's using a closure to pass the value -- closures that capture a single field are newtypes around that field so they get the field's ABI, but the improper-ctypes lint ignores closures.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 10, 2023

More to the point, there's no formal definition of "FFI-safe" and "FFI-unsafe" in the Rust language. As far as I can tell, the actual thing that bears the obligation to be ABI-correct is when you do

extern "C" {
    fn c_func(arg: SomeType) -> ReturnType;
}

fn rust_func() -> ReturnType {
    let arg = SomeType::random();
    // here, the Rust caller actually discharges the obligation...
    // which is honestly slightly unfortunate:
    // in real scenarios, we might not be the author of the `extern "C"` block!
    unsafe { c_func(arg) }
}

@RalfJung
Copy link
Member Author

Do we know if clang is doing anything clever here? I assume C has exactly the same problem, since their function types don't track target features either.

@chorman0773
Copy link
Contributor

clang does not fix this issue. See llvm-project#64706

@RalfJung
Copy link
Member Author

That's a different issue (mismatch with GCC). This here is about mismatch within code compiled by the same compiler, but with different locally set target features.

@chorman0773
Copy link
Contributor

Ah right. It does, in fact, mismatch with itself without the attribute.

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 11, 2023

(It also mismatches between -mavx and -mno-avx/__attribute__((target("avx"))))

@RalfJung
Copy link
Member Author

Do you happen to have a godbolt link demonstrating that? :)

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 12, 2023

https://godbolt.org/z/obhezje4n

Interpreting the assembly: In the global (-mavx) case, it passes __m256 in ymm registers, and returns them in ymm0. For the inactive case, it passes on the stack and returns in xmm0:xmm1. For the locally-enabled only case, it passes on the stack, and returns in ymm0 (according to Sys-V, there shouldn't be a difference with the globally enabled case).

Edit: I've added gcc's output for context. GCC has the correct output according to the ABI.
Edit: For other x86 assembler readers... yes gcc is using an integer simd load and an floating-point simd store. Yes it is correct. Yes it is mildly infuriating.

@workingjubilee workingjubilee added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 17, 2023
@RalfJung
Copy link
Member Author

RalfJung commented Oct 29, 2023

Turns out that clang is able to detect and warn at least some of these cases. Clang also allows putting __attribute__((target("avx"))) annotations at function declarations and uses them to decide whether or not to warn/error. In contrast, we allow #[target_feature] only on definitions, not on imports (extern "C" {} blocks).

One way to resolve this problem would be a new post-monomorphization check: traverse the argument (and return) types of the function being built (if it's a non-Rust ABI), as well as the argument and return types of every non-Rust-ABI function signature we call. If we find any repr(simd) types in there that's not behind a pointer indirection, ensure that vectors of that size are supported by the available target features. I don't know if there's any good cross-target way to do this; if not, we'll need a big match on the target architecture and hard-code which features need to be enabled for which vector size.

This is stricter than what clang does, but it's also safer, so if we can get away with it I think this would be my preferred approach. We don't have a good central place for post-mono checks, so we'll need to figure out what is the best way to share this between the codegen backends and Miri. Also, such a check would likely be optimization-dependent, since if an ABI-incompatible function call gets optimized away we wouldn't see it post-mono. We have to do the check post-mono though since we support generic extern "C" fn and there is no trait for "does not contain a SIMD type by-value" (and it would be a breaking change to start requiring such a trait).

(I think this is what @chorman0773 said they are planning to do in their compiler.)

@chorman0773
Copy link
Contributor

Yep (in my case, right down at the codegen level - codegen-x86 would reject the function when finding parameter locations). I actually have an RFC-being-written for prescribing this behaviour at the rust level.

@RalfJung
Copy link
Member Author

I actually have an RFC-being-written for prescribing this behaviour at the rust level.

Great!

Given that we currently do rather nonsensical things at the ABI level, I was going to suggest we should just implement a band-aid without waiting for an RFC. But that doesn't preclude designing something more proper. :)

@chorman0773
Copy link
Contributor

One note is this text that I put

When a function is called via an fn-item type (not a function pointer type), the target features of the callee are considered. [...]
Implementations of Rust would be required to ensure this call is performed properly (with parameters passed/returned in the appropriate registers), which may entail the use of a shim function or alternative code generation techniques.

This may require some extra work with llvm, or on the rustc side.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 29, 2023 via email

@chorman0773
Copy link
Contributor

(Actually, I could probably open it in a less formal capacity - I think there is some pre-discussion to be had before filing a full RFC)

@Noratrieb Noratrieb added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Oct 31, 2023
@RalfJung
Copy link
Member Author

RalfJung commented Nov 5, 2023

There's now a pre-RFC by @chorman0773 that proposes a way to resolve this problem.

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 7, 2023
…=compiler-errors

warn when using an unstable feature with -Ctarget-feature

Setting or unsetting the wrong target features can cause ABI incompatibility (rust-lang#116344, rust-lang#116558). We need to carefully audit features for their ABI impact before stabilization. I just learned that we currently accept arbitrary unstable features on stable and if they are in the list of Rust target features, even unstable, then we don't even warn about that!1 That doesn't seem great, so I propose we introduce a warning here.

This has an obvious loophole via `-Ctarget-cpu`. I'm not sure how to best deal with that, but it seems better to fix what we can and think about the other cases later, maybe once we have a better idea for how to resolve the general mess that are ABI-affecting target features.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 7, 2023
… r=compiler-errors

warn when using an unstable feature with -Ctarget-feature

Setting or unsetting the wrong target features can cause ABI incompatibility (rust-lang#116344, rust-lang#116558). We need to carefully audit features for their ABI impact before stabilization. I just learned that we currently accept arbitrary unstable features on stable and if they are in the list of Rust target features, even unstable, then we don't even warn about that!1 That doesn't seem great, so I propose we introduce a warning here.

This has an obvious loophole via `-Ctarget-cpu`. I'm not sure how to best deal with that, but it seems better to fix what we can and think about the other cases later, maybe once we have a better idea for how to resolve the general mess that are ABI-affecting target features.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 7, 2023
Rollup merge of rust-lang#117616 - RalfJung:unstable-target-features, r=compiler-errors

warn when using an unstable feature with -Ctarget-feature

Setting or unsetting the wrong target features can cause ABI incompatibility (rust-lang#116344, rust-lang#116558). We need to carefully audit features for their ABI impact before stabilization. I just learned that we currently accept arbitrary unstable features on stable and if they are in the list of Rust target features, even unstable, then we don't even warn about that!1 That doesn't seem great, so I propose we introduce a warning here.

This has an obvious loophole via `-Ctarget-cpu`. I'm not sure how to best deal with that, but it seems better to fix what we can and think about the other cases later, maybe once we have a better idea for how to resolve the general mess that are ABI-affecting target features.
@RalfJung RalfJung added the A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. label Apr 25, 2024
@RalfJung RalfJung changed the title The extern "C" ABI of x86 vector types depends on target features The extern "C" ABI of vector types depends on target features Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-abi Area: Concerning the application binary interface (ABI). A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
9 participants