Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: handle carry flags as bools in backends #2860

Closed
Hypersonic opened this issue Apr 27, 2021 · 13 comments
Closed

Cranelift: handle carry flags as bools in backends #2860

Hypersonic opened this issue Apr 27, 2021 · 13 comments
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Comments

@Hypersonic
Copy link

Hi! While using Cranelift as a backend for a project, I ran into a panic compiling iadd_carry instructions on x86_64. Please let me know if there is any other information I can provide that would be helpful.

.clif Test Case

function u0:0() fast {
block0:
    v1465 = iconst.i32 0
    v1467 = bconst.b1 false
    v1468 = iconst.i32 0
    v1469, v1470 = iadd_carry v1465, v1468, v1467
    v1476 = bint.i8 v1470
    v1356 = iconst.i64 0
    v1391 -> v1356
    v1423 -> v1356
    v1447 -> v1356
    v1464 -> v1356
    v1494 -> v1356
    v1516 -> v1356
    v1544 -> v1356
    store notrap aligned v1476, v1356+274
    trap user0
}

Steps to Reproduce

cargo run -p cranelift-tools -- compile test.clif --target x86_64

Expected Results

Code successfully generates without a panic.

Actual Results

Panic: thread 'main' panicked at 'ALU+imm and ALU+carry ops should not appear here!', cranelift/codegen/src/isa/x64/lower.rs:5740:13

Versions and Environment

Cranelift version or commit: main HEAD (4a830b at time of writing)

Operating system: Linux

Architecture: x86_64

@Hypersonic Hypersonic added bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator labels Apr 27, 2021
@iximeow
Copy link
Contributor

iximeow commented Apr 27, 2021

using this as an excuse to (finally) familiarize myself with the MachInst backend, i'm not sure why ALU+imm and ALU+carry ops should not appear here should be the case - it looks like x64/lower.rs is the place for lowering to happen in x64, eschewing codegen/meta's lowering?

and if that's so, i'd expect rustc_cranelift_codegen to have hit this, but maybe not, since this is the only carry-shaped instruction i see over there.

@cfallin
Copy link
Member

cfallin commented Apr 27, 2021

Greetings @Hypersonic and thanks for the report!

The plan we made when starting to design the new backend was to phase out add+carry (and sub+borrow) opcodes in the CLIF, instead directly providing the wider adds/subs that one would normally lower into carry-using ops. In the legalization-based old backend, i128.add became an i64.add_ifcout and an i64.add_carry; now we just open-code the two-add sequence. (This was in discussions between myself, @julian-seward1 and @bnjbvr back in Jan 2020 or so.)

The reason for this design choice was that representing carry flags as normal values is fragile performance-wise: to get the correct (fast) machine code, which implicitly passes the bool in the carry flag, one has to pattern-match on the two instructions. The case where the b1 is separate -- as in your example, coming from a constant -- has to be lowered separately with pushf or setc to extract the flag and popf or a conditional diamond with clc/stc to set it. The materialized-carry-flag-value codepaths would almost never be used and so are a correctness risk as well; and these instructions are slow, to the point that there should (hopefully!) almost always be a better way to solve the problem.

Given that we just recently switched the default backend and haven't started the work of removing the old one, we haven't yet cleaned up the opcode space, but we intend to do so eventually.

That does leave the question: what if one actually wants to generate an add-with-carry? I'm definitely open to understanding use-cases better. Are you hoping to take an arbitrary b1 and feed it into the add? Or is this part of a lowering of some other wider arithmetic operation? If the latter, is there any other instruction that would help?

@bjorn3
Copy link
Contributor

bjorn3 commented Apr 27, 2021

One use it to check for overflow to crash. In cg_clif I currently have to manually check if an overflow has happened even though at least on x86 the processor already provides this information in the form of a flag.

@cfallin
Copy link
Member

cfallin commented Apr 27, 2021

@bjorn3 there is the iadd_ifcout instruction which is supported on the new backends (because stack-limit checks use it too IIRC) -- that should be usable to detect overflows? If not, we can add a variant that works for your case. Getting the flag out is easier (SETcc); it's the reloading into carry and use by subsequent operation, or pattern-matching to avoid that, that's slow/fragile.

@Hypersonic
Copy link
Author

My main use case is detecting the carry-out -- I don't really have any particular need for the carry-in, but this was the instruction I found that would give me the output I needed. If there's a different instruction that can give me that, I'm more than happy to use it.

@cfallin
Copy link
Member

cfallin commented Apr 27, 2021

@Hypersonic the iadd_ifcout instruction should give you just that, then, but if your case needs anything other then a trap, we'd need to add pattern-matching for e.g. selectif on its iflags result (right now we just match a trapif on its flags). We can certainly do that! I'm swamped at the moment but can get to this in a while if no one else is able to sooner.

@Hypersonic
Copy link
Author

I think I'm covered by iadd_ifcout, thanks for the info! I'm going to close this issue since I think it's resolved for me.

@chc4
Copy link

chc4 commented Sep 20, 2021

I'm not sure if this issues should be re-opened or I should make a new one, but iadd_carry and iadd_ifcarry are still definitely broken on x86_64.

Example Cranelift IR that doesn't compile:

function u0:0(i64, i64, i64) -> i64 system_v {
block0(v0: i64, v1: i64, v2: i64):
    jump block1

block1:
    v3 = iconst.i64 -5
    v4, v5 = isub_ifbout.i64 v2, v3
    v6 = iconst.i64 2
    v7 = iconst.i64 -1
    v8, v9 = iadd_ifcarry v6, v7, v5
    return v8
}

(Ignore the weird block0)

Changing it to use isub_bout and iadd_carry doesn't work either.

There was a comment asking about what contexts would need this: I'm writing an x86_64 recompilation/partial application engine using Cranelift as a lifting target. That Cranelift snippet is what I am generating for

move |a: usize| {
    let val: usize;
    if Wrapping(a) + Wrapping(0x5) < Wrapping(a) {
        val = 1;
    } else {
       val = 2;
    }
    val
}

or the compiled assembly function of

---- cmp rsi, -0x5
---- mov eax, 0x2
---- adc rax, -0x1
---- ret

Notably, I don't have access to the un-optimized assembly code that is more "natural" and doesn't depend on the CF behavior. I've had several other issues with Cranelift IFLAGS (and the assumption that all ALU operations clobber all flags means this project is probably dead in the water), but this issue (or a similar one) should probably be re-opened to let future users know.

@cfallin
Copy link
Member

cfallin commented Sep 20, 2021

Hi @chc4 -- we can definitely talk about how to support this use-case; I'll re-open the issue. (As a general principle, we want Cranelift to be as generally applicable as is practical; so "here's my use-case and I can't figure out how to express it in CLIF" is always a valid issue!)

We've been discussing recently how to clean up the IR (see e.g. #3205 for discussion of booleans, which overlaps somewhat with the discussion of iflags values here, and bytecodealliance/rfcs#12 which is proposing to remove the old x86 backend that introduced a lot of these lower-level concepts) so these flags-related ops are sort of in a transition period. The application that most commonly used them before -- wide arithmetic with carries/borrows -- is now lowered directly, so we just have sort of skeletal support for any other use-cases that crop up, until we work out the best long-term design. The general approach that I think is best is to represent specific flags as bools and provide ops as needed, so e.g. iadd_carry is a completely reasonable operator to have available. If you or someone else has the bandwidth to add this lowering, it's probably ~10 lines of code; I can point out the right spot. Sorry for the incomplete/in-transition nature of things here!

@cfallin cfallin reopened this Sep 20, 2021
@chc4
Copy link

chc4 commented Sep 20, 2021

I'm still not sure it would fit my use-case, since I need correct implementation of all other processor flags, not just CF. Using a B1 and codegen'ing it back to iflags at instruction selection would work for CF, but I'd probably also need code motion and disjoint flag tracking on the selection backend so that e.g. add x, y; lea w, [x+1]; jz ... doesn't error from an invalid iflags live range. I can lift it as v0, b0 = iadd_cout.i64 x, y; v1 = iadd.i64 x, 1; zf = icmp_imm.i64 v0, 0; br eq ..., but I don't think Cranelift is currently smart enough to treat the icmp_imm as an optional use for iflage.ZF and select the non-flags-clobbering LEA for iadds with no carry-out if possible, or move the iadd_cout below it if the LEA isn't encodable. This may be fine, but I also need e.g. OF, which I'm not sure how I would even naively emit as a branch case without a direct iflags type.

If it is usable that would be great, but I assume I would either have to wait for the iflags design to be stabilized further or write optimization and selection passes myself, which might be a bit too much work for a side-project. My current thought is to switch to dynasm-rs so I can lift instructions to the same flags behavior (and re-implement my own IR and register allocation...)

We can continue this discussion on Zulip to avoid clogging this issue further.

@cfallin
Copy link
Member

cfallin commented Sep 20, 2021

We can continue this discussion on Zulip

For others who want to follow along: link

@cfallin cfallin changed the title Cranelift: Panic compiling iadd_carry on x86_64 Cranelift: handle carry flags as bools in backends May 4, 2022
@cfallin cfallin added the cranelift:area:x64 Issues related to x64 codegen label May 4, 2022
@scottmcm
Copy link
Contributor

Is part of this fixed with #5784 ?

@jameysharp
Copy link
Contributor

I'm not entirely sure what this issue is about at this point, so it's hard to say whether it's fixed. 😅

If it's about addition with carry not having a usable lowering on x86, then I think that was fixed either by #5784 or earlier work. Cranelift has changed a lot since this issue was opened so it's hard to even compare.

If I consider only the issue title ("handle carry flags as bools in backends") then I think this was fixed by #5406 and earlier work.

For chc4's use case of encoding the behavior of x86 flags into CLIF, I think we've gone the opposite direction since we've removed the notion of flags from the IR entirely. (Again, in #5406 and earlier.) It's always possible to encode values equivalent to x86's flags in integer values and carry those around, though we may not generate particularly efficient code from that encoding.

Regardless of which interpretation we consider, I think this issue is no longer serving a useful purpose. So I'm going to close it.

I'd suggest that anyone who encounters issues which sound like this should open a new issue, explaining exactly what you're seeing. Anything that comes up now is probably not related to the difficulties described in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Development

No branches or pull requests

7 participants