Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in File::read_to_end on 1.57.0 Nightly #90263

Closed
noproto opened this issue Oct 25, 2021 · 15 comments
Closed

Regression in File::read_to_end on 1.57.0 Nightly #90263

noproto opened this issue Oct 25, 2021 · 15 comments
Labels
A-io Area: std::io, std::fs, std::net and std::path C-bug Category: This is a bug. P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Milestone

Comments

@noproto
Copy link
Contributor

noproto commented Oct 25, 2021

PR #89582 (commit 910692d) created a regression in File::read_to_end on the Rust Nightly channel (1.57.0 Nightly). This is before main(), it impacts cdylib projects.

Code

lib.rs

use std::io::prelude::*;

// init_rtld_audit_interface
#[used]
#[allow(non_upper_case_globals)]
#[link_section = ".init_array"]
static init_rtld_audit_interface: unsafe extern "C" fn(i32, *const *const i8, *const *const i8) = {
    #[link_section = ".text.startup"]
    unsafe extern "C" fn init_rtld_audit_interface(_argc: i32, _argv: *const *const i8, _envp: *const *const i8) { read_proc_environ(); }
    init_rtld_audit_interface
};

// la_version
#[no_mangle]
unsafe extern "C" fn la_version(version: u32) -> u32 { version }

fn read_proc_environ() {
    let mut environ = std::fs::File::open("/proc/self/environ").expect("Lost track of environment");
    let mut environ_contents: Vec<u8> = vec![];
    environ.read_to_end(&mut environ_contents).expect("Unexpected null reference");
}

Cargo.toml

# General info
[package]
name = "readenv"
version = "0.1.0"
edition = "2021"

# Build targets
[lib]
name = "readenv"
path = "lib.rs"
crate-type = ["cdylib"]

Makefile

test:
	@echo "Rust nightly-10-09"
	cargo +nightly-2021-10-09 build --lib --release;LD_AUDIT=${PWD}/target/release/libreadenv.so /usr/bin/test || true
	@echo "Rust nightly-10-10"
	cargo +nightly-2021-10-10 build --lib --release;LD_AUDIT=${PWD}/target/release/libreadenv.so /usr/bin/test || true

I expected to see this happen: (no output)

Rust nightly-10-09
cargo +nightly-2021-10-09 build --lib --release;LD_AUDIT=/home/user/readenv/target/release/libreadenv.so /usr/bin/test || true
   Compiling readenv v0.1.0 (/home/user/readenv)
    Finished release [optimized] target(s) in 0.55s
Rust nightly-10-10
cargo +nightly-2021-10-10 build --lib --release;LD_AUDIT=/home/user/readenv/target/release/libreadenv.so /usr/bin/test || true
   Compiling readenv v0.1.0 (/home/user/readenv)
    Finished release [optimized] target(s) in 0.44s

Instead, this happened: Segmentation fault

Rust nightly-10-09
cargo +nightly-2021-10-09 build --lib --release;LD_AUDIT=/home/user/readenv/target/release/libreadenv.so /usr/bin/test || true
   Compiling readenv v0.1.0 (/home/user/readenv)
    Finished release [optimized] target(s) in 0.55s
Rust nightly-10-10
cargo +nightly-2021-10-10 build --lib --release;LD_AUDIT=/home/user/readenv/target/release/libreadenv.so /usr/bin/test || true
   Compiling readenv v0.1.0 (/home/user/readenv)
    Finished release [optimized] target(s) in 0.44s
Segmentation fault

Version it worked on

It most recently worked on: Rust 1.57.0-nightly, nightly-2021-10-09-aarch64-unknown-linux-gnu

Version with regression

Regression occurred in: Rust 1.57.0-nightly, nightly-2021-10-10-aarch64-unknown-linux-gnu

rustc --version --verbose:

rustc 1.57.0-nightly (a8f2463c6 2021-10-09)
binary: rustc
commit-hash: a8f2463c68a6532d74a13ec402ec5b513e4e2726
commit-date: 2021-10-09
host: aarch64-unknown-linux-gnu
release: 1.57.0-nightly
LLVM version: 13.0.0

@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged

@noproto noproto added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Oct 25, 2021
@rustbot rustbot added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. and removed regression-untriaged Untriaged performance or correctness regression. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Oct 25, 2021
@jkugelman
Copy link
Contributor

It does not crash on my PC, which is host: x86_64-unknown-linux-gnu.

What kind of ARM device are you using? I will try it on my Raspberry Pi 4 tonight.

@Mark-Simulacrum Mark-Simulacrum added this to the 1.57.0 milestone Oct 25, 2021
@noproto
Copy link
Contributor Author

noproto commented Oct 25, 2021

It does not crash on my PC, which is host: x86_64-unknown-linux-gnu.

What kind of ARM device are you using? I will try it on my Raspberry Pi 4 tonight.

The crashes are occuring on two aarch64 virtual machines of Ubuntu 20.04 ARM LTS (Parallels running on Apple M1). It's interesting that the behavior isn't on x86_64, I'll check on my x86_64 device now.

@noproto
Copy link
Contributor Author

noproto commented Oct 25, 2021

Reproduced on Ubuntu 20.04 x86_64 LTS.

Rust nightly-10-09
cargo +nightly-2021-10-09 build --lib --release;LD_AUDIT=/home/user/readenv/target/release/libreadenv.so /usr/bin/test || true
   Compiling readenv v0.1.0 (/home/user/readenv)
    Finished release [optimized] target(s) in 1.09s
Rust nightly-10-10
cargo +nightly-2021-10-10 build --lib --release;LD_AUDIT=/home/user/readenv/target/release/libreadenv.so /usr/bin/test || true
   Compiling readenv v0.1.0 (/home/user/readenv)
    Finished release [optimized] target(s) in 0.88s
Segmentation fault

@jkugelman
Copy link
Contributor

I can reproduce it on my WSL2 machine running Linux 5.10.16 x86_64.

@jkugelman
Copy link
Contributor

jkugelman commented Oct 26, 2021

The segfault is coming from the code I added to File::read_to_end. It queries the file size so it can reserve that much space in the output Vec<u8>.

Tracing through to the crash location, it is segfaulting in a function called try_statx on this statement:

// It is a trick to call `statx` with null pointers to check if the syscall
// is available. According to the manual, it is expected to fail with EFAULT.
// We do this mainly for performance, since it is nearly hundreds times
// faster than a normal successful call.
let err = cvt(statx(0, ptr::null(), 0, libc::STATX_ALL, ptr::null_mut()))
.err()
.and_then(|e| e.raw_os_error());

For context, this code tries to get file metadata on Linux by calling the statx() syscall. This system call was added in Linux 4.11 and isn't available in earlier kernel versions, so this line tries to detect whether the syscall is available by calling it with dummy null pointers and seeing if it returns EFAULT.

This code works perfectly fine in normal circumstances, however in the LD_AUDIT preload context it is crashing for some reason. I don't yet know why, though I think this is as deep as I can trace the Rust std code.

Other notes:

  • I checked with strace and the statx syscall does not show up in the output. I take it to mean that the process is crashing before the syscall is able to be invoked.

  • I was unable to reproduce the crash earlier on a CentOS 7 machine running Linux 3.10, where the statx syscall is not supported. I haven't tested it to confirm, but I expect that on that machine Rust will still end up at this line of code. I wonder why it doesn't segfault there when statx is missing?

@jkugelman
Copy link
Contributor

jkugelman commented Oct 26, 2021

This can be reproduced by calling std::fs::metadata from init_rtld_audit_interface. The file name doesn't matter; it can be anything at all, even a nonexistent file.

// init_rtld_audit_interface
#[used]
#[allow(non_upper_case_globals)]
#[link_section = ".init_array"]
static init_rtld_audit_interface: unsafe extern "C" fn(i32, *const *const i8, *const *const i8) = {
    #[link_section = ".text.startup"]
    unsafe extern "C" fn init_rtld_audit_interface(
        _argc: i32,
        _argv: *const *const i8,
        _envp: *const *const i8,
    ) {
        let _ = std::fs::metadata("");
    }
    init_rtld_audit_interface
};

// la_version
#[no_mangle]
unsafe extern "C" fn la_version(version: u32) -> u32 {
    version
}

@JohnTitor JohnTitor added regression-from-stable-to-beta Performance or correctness regression from stable to beta. A-io Area: std::io, std::fs, std::net and std::path T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed regression-untriaged Untriaged performance or correctness regression. labels Oct 26, 2021
@jkugelman
Copy link
Contributor

Source of the statx null trick:

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. P-medium Medium priority and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 27, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Oct 27, 2021

We discussed this in the @rust-lang/libs meeting just now. We decided to prioritize this as P-medium, since it's not clear if this is triggerable in any supported environment. We didn't have the chance to investigate in detail, but one guess is that using std in LD_AUDIT might mean that weak symbols (e.g. looking up statx before faling back to syscall(SYS_statx)) don't behave as expected, since the dynamic linker is not ready yet.

Depending on any new discoveries on when this issue appears, we can raise or lower the priority.

@jkugelman
Copy link
Contributor

Indeed, the call to dlsym is segfaulting.

unsafe fn fetch(name: &str) -> usize {
let name = match CStr::from_bytes_with_nul(name.as_bytes()) {
Ok(cstr) => cstr,
Err(..) => return 0,
};
libc::dlsym(libc::RTLD_DEFAULT, name.as_ptr()) as usize
}

@noproto
Copy link
Contributor Author

noproto commented Oct 27, 2021

@jkugelman That was some truly impressive tracing through the source. If the dlsym call is segfaulting, I believe the problem will lie in ld (in glibc). To confirm, I'll port the initial test case to C and see if it segfaults in the same location by doing a dlsym of statx. If so, I'd like to close this issue. I'll keep you all posted.

@jkugelman
Copy link
Contributor

Thanks. I spent hours digging into the guts of ld-linux-x86_64.so.2, glibc, and the kernel before I figured out that the weak symbol process Mara mentioned isn't system code, it's Rust code. The answer was within arm's reach the whole time, I just had to look a little deeper into the syscall! macro. 🤦

@joshtriplett
Copy link
Member

I can fairly easily imagine ways we could fix this specific incarnation of this issue. For instance, this optimization doesn't really need statx.

However, we don't in general make guarantees about the exact syscalls or library functions we call. I personally think the LD_AUDIT interface may be sufficiently specialized that we can't necessarily guarantee Rust functions will remain within the constraints of what's necessary to be called in that context.

@joshtriplett
Copy link
Member

We discussed this again in today's @rust-lang/libs meeting.

We agreed that, in general, we're not prepared to keep all of the standard library within the limitations required to implement an LD_AUDIT library.

However, we'd absolutely be amenable in general to alternatives to dlsym. We used to use weak symbols, but that caused problems with tools that use symbol references to set library package dependencies (e.g. Debian's dh_shlibdeps tool); if those tools could learn to ignore weak symbols, or there was some way we could declare them such that those tools ignore them, we could go back to using them. Or, alternatively, we could make syscalls directly, in cases where the C library's wrapper is not essential. (There are syscalls that are not necessarily safe to call directly, but for the rest we could.)

@cuviper
Copy link
Member

cuviper commented Nov 10, 2021

We used to use weak symbols, but that caused problems with tools that use symbol references to set library package dependencies (e.g. Debian's dh_shlibdeps tool);

The specific problem there was #23628 because we use __pthread_get_minstack@GLIBC_PRIVATE, which does not have a normal sortable version. That symbol could also go away at any time, being private and all, but glibc authors are aware that we're using it and have said that they will work with us before changing it. With a dlsym-based weak symbol, we would actually be just fine if they remove it altogether.

For everything else we use as a weak symbol, I think it would be fine for those to resolve to a normal name@GLIBC_X.Y.Z symbol, just like any non-weak symbol we use. If that happens on our maximally-compatible dist builds with old glibc, this is a sign that we don't really need to be weak on that one anymore. This would not be the case for statx, but an ELF-weak symbol might still fare better in this LD_AUDIT environment.

I'm basically suggesting that we could keep the dlsym-weak for __pthread_get_minstack, but switch to ELF-weak symbols for stuff that we know is properly stable and versioned in newer glibc. And for stuff like clone3 that doesn't have any libc wrapper yet, we should probably always use syscall for now, without any weak symbols. I can play with this.

@cuviper
Copy link
Member

cuviper commented Nov 12, 2021

See #90846 for an attempt at weak linkage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: std::io, std::fs, std::net and std::path C-bug Category: This is a bug. P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants