Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve libcall relocations for older CPUs #5567

Merged
merged 4 commits into from
Jan 18, 2023

Conversation

alexcrichton
Copy link
Member

Long ago Wasmtime used to have logic for resolving relocations post-compilation for libcalls which I ended up removing during refactorings last year. As #5563 points out, however, it's possible to get Wasmtime to panic by disabling SSE features which forces Cranelift to use libcalls for some floating-point operations instead. Note that this also requires disabling SIMD because SIMD support has a baseline of SSE 4.2.

This commit pulls back the old implementations of various libcalls and reimplements logic necessary to have them work on CPUs without SSE 4.2

Closes #5563

Long ago Wasmtime used to have logic for resolving relocations
post-compilation for libcalls which I ended up removing during
refactorings last year. As bytecodealliance#5563 points out, however, it's possible to
get Wasmtime to panic by disabling SSE features which forces Cranelift
to use libcalls for some floating-point operations instead. Note that
this also requires disabling SIMD because SIMD support has a baseline of
SSE 4.2.

This commit pulls back the old implementations of various libcalls and
reimplements logic necessary to have them work on CPUs without SSE 4.2

Closes bytecodealliance#5563
@alexcrichton alexcrichton marked this pull request as ready for review January 12, 2023 23:20
@alexcrichton
Copy link
Member Author

I primarily tested this through:

ALLOWED_ENGINES=wasmi ALLOWED_MODULES=single-inst cargo +nightly fuzz run -s none differential --no-default-features

to find all instructions that need libcall lowerings. Cranelift still has more libcalls but it seems that we don't end up relying on them, so I've left various panics in place to notify us if libcalls pop up and they're not handled. Otherwise the fuzzing has run for awhile now and hasn't found more instructions so this is likely at least a good set to start with.

@github-actions github-actions bot added the fuzzing Issues related to our fuzzing infrastructure label Jan 12, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this looks like it has no runtime cost except when it's actually needed. I have a couple questions but overall this looks like the right plan to me!

crates/environ/src/obj.rs Outdated Show resolved Hide resolved
Comment on lines 59 to 72
let sse3 = u.arbitrary::<bool>()?;
let ssse3 = u.arbitrary::<bool>()?;
let sse4_1 = u.arbitrary::<bool>()?;
let sse4_2 = u.arbitrary::<bool>()?;

for (name, val) in flags {
match name.as_str() {
"has_sse3" => *val = sse3.to_string(),
"has_ssse3" => *val = ssse3.to_string(),
"has_sse41" => *val = sse4_1.to_string(),
"has_sse42" => *val = sse4_2.to_string(),
_ => {}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after reading the above comment several times I still had a hard time figuring out that these calls to arbitrary() can't be moved into the loop.

How about something like this to avoid having to keep two lists of feature names in sync, and hopefully make it less tempting to inline the variables into the loop?

Suggested change
let sse3 = u.arbitrary::<bool>()?;
let ssse3 = u.arbitrary::<bool>()?;
let sse4_1 = u.arbitrary::<bool>()?;
let sse4_2 = u.arbitrary::<bool>()?;
for (name, val) in flags {
match name.as_str() {
"has_sse3" => *val = sse3.to_string(),
"has_ssse3" => *val = ssse3.to_string(),
"has_sse41" => *val = sse4_1.to_string(),
"has_sse42" => *val = sse4_2.to_string(),
_ => {}
}
}
let new_flags = ["has_sse3", "has_ssse3", "has_sse41", "has_sse42"]
.into_iter()
.map(|name| (name, u.arbitrary()?))
.collect::<arbitrary::Result<HashMap<&'static str, bool>>()?;
for (name, val) in flags {
if let Some(new_value) = new_flags.get(name) {
*val = new_value.to_string();
}
}

Also, I think this is okay, but just to check: Each of these features implies the previous one (e.g. if you have SSE4.2 you also have SSE4.1). Is it okay to turn some off without turning off later ones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to turn some off without turning off later ones?

While I'm not 100% certain myself I believe that this question is up to Cranelift mostly. I believe Cranelift is mostly structured around "if this feature is active emit this instr" while it doesn't make sense to disable sse3 but leave sse4.1 enabled I don't think it will break anything since it would be a sort of "chaos mode" for cranelift stressing it.

If this becomes a problem for Cranelift, however, we can tweak the fuzz input to respect what the actual CPU feature hierarchy is.

const TOINT_32: f32 = 1.0 / f32::EPSILON;
const TOINT_64: f64 = 1.0 / f64::EPSILON;

pub extern "C" fn nearestf32(x: f32) -> f32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if there was a good way to share these implementations of round_ties_even with the ones in cranelift/codegen/src/ir/immediates.rs. None come to my mind off-hand since the runtime can be built in a configuration without the compiler, but it'd be nice.

The existing version there doesn't have the "Check for NaNs" section that this version does. I haven't thought hard enough about this to understand what that's for, but maybe you could add a comment linking to some reference for why this is the right implementation?

The existing version does have a link to rust-lang/rust#96710 which I think is worth copying here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These libcalls were removed in #4470 and I copied their source from just before that commit.

Apparently nearest has a slightly storied history:

I ran cargo run wast ./tests/spec_testsuite/f32.wast --cranelift-set has_sse41=false --wasm-features=-simd and commented out the "Check for NaNs" block and the tests failed, so the block is definitely load-bearing. That may mean that Cranelift's implementation needs to be updated to account for NaN's perhaps? (I don't really understand the actual underlying code here I'm just copying bytes)

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my questions! Still looks good to me. 👍

@alexcrichton alexcrichton merged commit 9b896d2 into bytecodealliance:main Jan 18, 2023
@alexcrichton alexcrichton deleted the libcalls branch January 18, 2023 15:04
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jan 19, 2023
A fuzz bug came in last night from bytecodealliance#5567 where spectest fuzzing will
first generate a config, possibly with SSE features for SIMD disabled,
only to have SIMD later enabled by `set_spectest_compliant`. This commit
fixes the issue by changing to `is_spectest_compliant` as a query and
throwing out the fuzz case if it isn't. This means that the spectest
fuzzer will throw out more inputs but means we can continue to generate
interesting configs and such for other inputs.
alexcrichton added a commit that referenced this pull request Jan 19, 2023
A fuzz bug came in last night from #5567 where spectest fuzzing will
first generate a config, possibly with SSE features for SIMD disabled,
only to have SIMD later enabled by `set_spectest_compliant`. This commit
fixes the issue by changing to `is_spectest_compliant` as a query and
throwing out the fuzz case if it isn't. This means that the spectest
fuzzer will throw out more inputs but means we can continue to generate
interesting configs and such for other inputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzzing Issues related to our fuzzing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cranelift: cannot generate relocation against libcall FloorF64, when has_sse41=false
2 participants