Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemV ABI Mismatch on x86 with a repr(C) enum for extern "C"/FFI functions. #68190

Closed
sw17ch opened this issue Jan 13, 2020 · 15 comments · Fixed by #68443
Closed

SystemV ABI Mismatch on x86 with a repr(C) enum for extern "C"/FFI functions. #68190

sw17ch opened this issue Jan 13, 2020 · 15 comments · Fixed by #68443
Assignees
Labels
A-ffi Area: Foreign Function Interface (FFI) C-bug Category: This is a bug. O-linux Operating system: Linux O-macos Operating system: macOS O-x86_32 Target: x86 processors, 32 bit (like i686-*) P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@sw17ch
Copy link
Contributor

sw17ch commented Jan 13, 2020

The Problem

Using the FFI, without unsafe, it's possible to get a segfault or incorrect results from defined behaviors (see rust-lang/rfcs#2195 and #46123) around non-C-like enumerations. I have only tested this on Linux with an x86_64 processor. I have reproduced these problems with both of gcc9 and clang8. The problems are exhibited both on a recent rustc master and on rustc stable 1.40.0.

I've written two tests against the rust-lang/rust repository that demonstrate the problems described later in this issue. They are here:

  1. This is the test for returning enumerations by value that segfaults
  2. This is the test for passing enumerations by value as arguments that fails assertions

Side note: the language reference states that non-C-like enumerations have unspecified layout, but I believe that this is no longer true after merging #46123 when specifying a repr for the enumeration.

Reproducing a Segmentation Fault When Returning by Value

Given an enumeration like this:

#[repr(C,u8)]
pub enum OptionLikeType {
	OptionLikeSome(u64),
	OptionLikeNone,
}

And an FFI function like this:

#[no_mangle]
pub extern "C" fn option_like_type_new(value: u64) -> OptionLikeType {
    OptionLikeType::OptionLikeSome(value)
}

And an invocation from C like this:

// Types generated from OptionLikeType by cbindgen version 0.12.1
enum OptionLikeType_Tag {
  OptionLikeSome,
  OptionLikeNone,
};
typedef uint8_t OptionLikeType_Tag;

typedef struct {
  uint64_t _0;
} OptionLikeSome_Body;

typedef struct {
  OptionLikeType_Tag tag;
  union {
    OptionLikeSome_Body option_like_some;
  };
} OptionLikeType;

int main(int argc, char *argv[]) {
  (void)argc; (void)argv;

  printf("Create OptionLikeType by return value\n");
  OptionLikeType olt = option_like_type_new(10);
  assert(olt.tag == OptionLikeSome);
  assert(olt.option_like_some._0 == 10);

  return 0;
}

The compiled C file linked against the Rust static library cause a segmentation fault:

$ gcc9 -ggdb3 -Wall -o test_return_option_by_value.bin test_return_option_by_value.c -Ltarget/debug -lrepro -ldl -lpthread
$ ./test_return_option_by_value.bin
Create OptionLikeType by return value
Segmentation fault (core dumped)

Reproducing an Assertion Failure When Passing by Value

Given the same Rust OptionLikeSome type from above, and the same C type representation, define a Rust function that adds two options like this:

#[no_mangle]
pub extern "C" fn option_like_type_add(a: OptionLikeType, b: OptionLikeType) -> u64 {
    use OptionLikeType::{OptionLikeSome, OptionLikeNone};
    match (a,b) {
        (OptionLikeSome(a), OptionLikeSome(b)) => a + b,
        (OptionLikeSome(a), OptionLikeNone) => a,
        (OptionLikeNone, OptionLikeSome(b)) => b,
        _ => 0,
    }
}

Then define a C function that exercises it like this:

int main(int argc, char *argv[]) {
  (void)argc; (void)argv;
  printf("Add two OptionLikeType instances by value\n");

  OptionLikeType a = {.tag = OptionLikeSome, .option_like_some = { ._0 = 10 } };
  OptionLikeType b = {.tag = OptionLikeSome, .option_like_some = { ._0 = 20 } };
  
  uint64_t r = option_like_type_add(a, b);
  printf("a + b is %" PRIu64 ", and is expected to be 30\n", r);
  assert(r == 30);

  return 0;
}

When running this C code, we get an unexpected result:

Add two OptionLikeType instances by value
a + b is 4748609293, and is expected to be 30
test_add_option_by_value.bin: test_add_option_by_value.c:18: main: Assertion `r == 30' failed.
Aborted (core dumped)

Other Notes

Reproduction is not limited to the exact shapes above. For example, the primitive type used in the repr does not seem to affect outcomes. #[repr(C,u32)] and #[repr(C,u64)] both exhibit the bugs.

Some Analysis

I believe that Rust is internally consistent about how it passes these enumerations, but it seems to be in violation of the SystemV guidance on how to pass parameters. For example, calling the above extern functions from Rust does not exhibit the invalid behavior.

Furthermore, for enumerations with larger representations, the bugs are also not present. For example, using two u64 values in the OptionLikeSome definition prevents the crash or assertion failure from surfacing.

SystemV Requirements

I received an enormous amount of help from @iximeow producing the following explanation.

What appears to be occurring is that rustc expects the caller to allocate space on the caller's stack for the return value, and then expects the caller to pass a pointer to that location in a register. gcc and clang both expect to pass smaller structures as registers. What's also interesting is that rustc does the "right thing" for structs that should have an identical layout.

Here's a comparison of the assembly generated for structs and enumerations that should have extremely similar layout: https://godbolt.org/z/Mo7cJ6. Notice how the initialization of an enumeration is being done on the caller's stack while the initialization of the struct is done entirely in registers.

Here's very similar code, but in C: https://godbolt.org/z/CCxigj. Notice that neither of the C functions use the stack for initialization.

@iximeow and I believe that the proper handling for the enumeration according to SystemV can be described as follows:

  • the enum is an aggregate of { u8, u64 }
  • from psABI-x86_64 section 3.2.3, The classification of aggregate (structures and arrays) and union types works as follows
    • Each field of an object is classified recursively so that always two fields are considered. The resulting class is calculated according to the classes of the fields in the eightbyte: ... "(d) If one of the classes is INTEGER, the result is the INTEGER."
  • so the elements of this aggregate are both INTEGER, barring other constraints.

When passing this type as an argument:

  • If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used
  • this is contrary to rustc's usage, passing a pointer to the enum, rather than its items directly.

When returning this type:

  • If the class is INTEGER, the next available register of the sequence %rax, %rdx is used.
  • this is contrary to rustc's usage, passing a pointer to the enum as a hidden first parameter, then returning that pointer in rax.
  • this may explain why a larger aggregate does not express this bug - with three or more INTEGER elements, the aggregate no longer fits in return registers, and becomes MEMORY with the hidden-pointer-parameter semantics rustc uses for the two-item aggregate

Summary

Something seems to treat enumerations differently from similarly laid out structs, and treats the enumerations incorrectly when passing them across SystemV ABI boundaries.

@jonas-schievink jonas-schievink added A-ffi Area: Foreign Function Interface (FFI) C-bug Category: This is a bug. I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 13, 2020
@Mark-Simulacrum
Copy link
Member

cc @eddyb

@sw17ch
Copy link
Contributor Author

sw17ch commented Jan 14, 2020

I've reproduced this as far back as 1.22 (I have not tried earlier versions), but I believe it was only supposed to be valid since 1.24.

In 1.22, the #[repr(C, u8)] syntax throws a warning because of conflicting representation hints.

@bitwalker
Copy link

I think there may be an issue in general with passing non-C-like enums by value.

I found this issue when digging into my own variation on the problem. On macOS (w/1.42.0-nightly, and clang 11 as shipped by Apple), I have code which either segfaults, or fails some assertions I've set up when passing a non-C-like enum by value to C++.

I've built a minimal repro here, it has the details necessary to reproduce. In short though, the example contains code that exercises two variations on a simple non-C-like enum. One variation uses #[repr(u32)], and one uses #[repr(C, u32)] to see what, if any, difference it makes.

What I'm seeing is that when passed by value (to C++, haven't tried the other way yet), both variations of the enum end up as garbage values, which I'm assuming is just uninitialized stack memory, since it doesn't segfault. When passed by reference, everything works fine. So the memory layout matches up, but something goes pear shaped when passing by value. That seems to line up with what is seen in this issue as well, so I think it is likely the same underlying problem.

Hope having another repro case helps!

@pnkfelix
Copy link
Member

Depending on what solution we adopt here, T-lang may need to be involved. Adding tag.

@pnkfelix pnkfelix added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jan 16, 2020
@pnkfelix
Copy link
Member

pnkfelix commented Jan 16, 2020

discussed at T-compiler meeting. P-high. (And decided its not relevant to T-lang.)

@pnkfelix pnkfelix added O-linux Operating system: Linux P-high High priority O-macos Operating system: macOS and removed T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jan 16, 2020
@pnkfelix
Copy link
Member

assigning to @eddyb to resolve or to mentor resolution

@eddyb
Copy link
Member

eddyb commented Jan 16, 2020

In an embarrassing turn of events, we don't handle enums at all:

abi::Variants::Multiple { .. } => return Err(Memory),

This was originally correct, but didn't get adjusted when we spec'd non-C-like enums for FFI.

@eddyb
Copy link
Member

eddyb commented Jan 22, 2020

Just found this again, so I opened #68443, and I think it fixes the issue, but I don't have an easy way to test it right now (and I can't spend much time on it).

If anyone wants to help, a testsuite addition that only passes after #68443, would be welcome.

@iximeow
Copy link
Contributor

iximeow commented Jan 22, 2020

The commits on @sw17ch's branch, sw17ch:broken-codegen-with-non-c-like-enum, add some tests around this and currently segfault or assert with incorrect results. Cherry picking these commits over there ought to do the trick.

@sw17ch
Copy link
Contributor Author

sw17ch commented Jan 22, 2020

@eddyb I'll work on a test suite that fails without #68443 based on the tests I've already written, while also trying to introduce tests for the case brought up by @bitwalker.

@sw17ch
Copy link
Contributor Author

sw17ch commented Jan 22, 2020

I have confirmed the patches from #68443 allows the two tests I have to pass (after confirming they still fail on master). I'll add the other test cases and provide a branch.

sw17ch added a commit to sw17ch/rust that referenced this issue Jan 22, 2020
One calls into C functions passing non-c-like enumerations by
value. The other calls into C expecting non-C-like enumerations as
returns.

These test cases are based on the tests provided by @bitwalker on
issue rust-lang#68190. The original tests were provided at:
https://github.com/bitwalker/rust_non_c_like_enums_issue/tree/2688d5c672bd4e289085fcdf1c6110e99e7e8ab1
@sw17ch
Copy link
Contributor Author

sw17ch commented Jan 22, 2020

I have added tests based on the tests provided by @bitwalker above to my branch. I confirmed that they (and my original tests) fail. After that, I cherry-picked the commits from #68443 and confirmed that it fixes all the tests cases. My branch is at: https://github.com/sw17ch/rust/tree/broken-codegen-with-non-c-like-enum

The relevant commits are: 7c061c2..cc2f018

If someone checks out 650e5f8 from my branch, and runs the 4 tests, they can expect 4 failures.

After moving forward to cc2f018, all 4 tests should pass. Here are the test commands I used:

./x.py test src/test/run-make-fulldeps/pass-non-c-like-enum-to-c/
./x.py test src/test/run-make-fulldeps/return-non-c-like-enum-from-c/
./x.py test src/test/run-make-fulldeps/arguments-non-c-like-enum/
./x.py test src/test/run-make-fulldeps/return-non-c-like-enum/

@eddyb thank you very much for the PR, it appears to fix the issue.

@joshtriplett
Copy link
Member

Just to confirm something, passing and returning a struct containing a u8 and u64 works, and it's just the generated struct of a repr(C,u8) enum that we don't handle correctly?

@sw17ch
Copy link
Contributor Author

sw17ch commented Jan 23, 2020

Correct. Structs work as expected. An enumeration with the same layout fails to conform to the SysV ABI.

However, the primitive type used in #[repr(C,prim)] does not appear to affect the outcome. My examples used u8, but the same symptom is present with u64.

@eddyb
Copy link
Member

eddyb commented Jan 23, 2020

@joshtriplett The way we handle layout today makes enum variants special, so the union of variants that you have on the C side is implicit (and it's unclear if we can do much there, but maybe we could have an extra abstraction layer specifically for FFI with ABIs that care about recursing through structs and unions).

eddyb pushed a commit to eddyb/rust that referenced this issue Feb 8, 2020
One calls into C functions passing non-c-like enumerations by
value. The other calls into C expecting non-C-like enumerations as
returns.

These test cases are based on the tests provided by @bitwalker on
issue rust-lang#68190. The original tests were provided at:
https://github.com/bitwalker/rust_non_c_like_enums_issue/tree/2688d5c672bd4e289085fcdf1c6110e99e7e8ab1
@bors bors closed this as completed in 85ffd44 Feb 8, 2020
@Noratrieb Noratrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Area: Foreign Function Interface (FFI) C-bug Category: This is a bug. O-linux Operating system: Linux O-macos Operating system: macOS O-x86_32 Target: x86 processors, 32 bit (like i686-*) P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants