Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC changing the overflow behavior for usize in release builds to panic #2635

Closed
wants to merge 6 commits into from

Conversation

alex
Copy link
Member

@alex alex commented Feb 10, 2019

@jonas-schievink jonas-schievink added T-lang Relevant to the language team, which will review and decide on the RFC. A-arithmetic Arithmetic related proposals & ideas A-primitive Primitive types related proposals & ideas labels Feb 10, 2019
@Ixrec
Copy link
Contributor

Ixrec commented Feb 11, 2019

I suspect this change is not feasible without also providing some way to opt-out of these runtime checks, even in release mode. An obvious option is to add a set of unchecked_add/sub/mul/etc methods to usize, though it would be a little weird to have both a checked_add and an unchecked_add method neither of which match the behavior of +.

@alex
Copy link
Member Author

alex commented Feb 11, 2019

What makes you say it's not feasible, performance or some other concern? How would the behavior of unchecked_* differ from wrapped_*? (Is it the method I hypothesized that matches the current debug_assert! behavior?)

@Ixrec
Copy link
Contributor

Ixrec commented Feb 11, 2019

Yeah, performance. My understanding is that enforcing any specific overflow semantics has a measurable cost on some platforms, even if it's zero-cost on other platforms, because different hardware has different "default" overflow behavior. Rust is less weird than C because it requires two's complement, but as far as I know that doesn't guarantee a specific overflow behavior (at least, I can't imagine we'd have chosen the current behavior otherwise) (EDIT: huh, apparently we do already guarantee wrapping in release builds). Someone more familiar with these kinds of performance portability issues would have to comment on how big of a deal that is in practice.

@comex
Copy link

comex commented Feb 11, 2019

I think this RFC fundamentally needs actual benchmark results (ideally a large variety of them), to allow everyone to have an informed discussion about costs versus benefits.

@steveklabnik
Copy link
Member

I like this RFC, and I believe that is the direction we want to move in, ideally. A few things:

  • RFC 560 defined the rules for integer overflow, mentioning it would be good
  • Additionally, note that the RFC text is a bit inaccurate in a sense; it's not "in debug mode" but "when debug_assertions are enabled. Basing it off of the RFC 560/reference text would be better.

Furthermore, I agree with @comex; RFC 560 mentions performance a number of times as the reason to make the decisions we made for the current behavior; if the performance were acceptable, I think there'd be broad support for checking in every case, let alone for just these sizes.


@Ixrec

My understanding is that enforcing any specific overflow semantics has a measurable cost on some platforms, even if it's zero-cost on other platforms, because different hardware has different "default" overflow behavior.

Yes, this is true, but we decided that we don't care as much about supporting that hardware as we do having consistent semantics. Non-two's compliment hardware is mostly legacy that we don't support particularly well in the first place, see discussions like https://gankro.github.io/blah/rust-layouts-and-abis/

NOTE: this is not a normative document and the Rust devs haven't been very diligent in committing to these claims, so be a bit wary of relying on a property here that lacks a citation.

For Rust to support a platform at all, its standard C dialect must:

  • Have integers be two's complement

As the note says, it's not normative, but I think, in this moment, it's roughly true.

@alex
Copy link
Member Author

alex commented Feb 11, 2019

I agree that measurements will ultimately be a necessity to accepting/merging this idea. In writing the RFC first, I wanted to address the chicken/egg scenario of writing the code before it was clear the idea was acceptable -- truth be told I expected people to find the idea of special casing usize to be more unusual than the feedback has been thus far!

There probably is performance differences on different architectures beyond the concern for those which don't have 2s complement arithmetic. x86 and ARM both have dedicated instructions for jump-on-overflow, however PPC and MIPS do not, so there's a non-trivial difference on the amount of code generated on these platforms for overflow checks.

That's an excellent point about the current behavior being keyed on debug_assertion, not debug. I'll update the text accordingly.

@hdevalence
Copy link

I think this is a great proposal and I'd love to see it adopted by Rust. Having an additional security layer is a huge advantage, and as noted in the description, it would have prevented exploitation of a number of past security issues.

Currently it's possible to enable overflow checks for all integer types using something like

[profile.release]
overflow-checks = true

in the Cargo.toml. However, this can't be used to test the proposed default, because it enables checks for all integer types, not just for usize/isize.

Perhaps a good way to move forward on this would be to add a (possibly unstable) mechanism, like usize-overflow-checks, which would allow enabling overflow checks just for usize/isize, and then running ecosystem-wide benchmarks (like lolbench) to see what the actual impact on real code is.

This mechanism might be useful independently of whether the behaviour proposed in the RFC is adopted; for instance, I would choose to use it in my projects.

@leonardo-m
Copy link

If you want to introduce this I'd like it with an optimization pass as done by the Swift compiler.

@alex
Copy link
Member Author

alex commented Feb 12, 2019

@leonardo-m Can you explain what you mean by that? I don't see overflow-checking as being something that'd be implemented as a compiler optimization. Generally the compiler optimization piece of this is removing overflow checks where there are range bounds on the inputs.

@leonardo-m
Copy link

@alex
Copy link
Member Author

alex commented Feb 12, 2019

Yes, as I mentioned in the Future Plans section, I suspect this work will spur desires for new and improved optimizations, either in Rust or LLVM. It is my hope that such optimizations will be nice to haves and not hard requirements for this to be practical.

There seems to be a lot of interest (unsurprisingly) in quantifying this. I'll take on trying to put together a proof of concept of this for Rustc that's good enough to get some measurements.

on `usize` values: `usize` is consistently used in `core`/`std` APIs which deal
in lengths, buffer sizes, etc., the kind of values where overflow can be
dangerous, so it's no coincidence that historic integer-overflow vulnerabilities
occured with `usize` values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a small thing, I think this should also happen for isize -- code that's using offset might be calculating in that type instead, and just generally I'd like the two types to be as similar as possible.

@scottmcm
Copy link
Member

I really like the nuance in doing this just for the types that are most likely the problem. And the scariest perf problem would be something like a[i+1], but LLVM already needs to be proving that that addition doesn't wrap in order to remove the bounds check anyway, so I suspect this might not be quite as scary as it seems. (If it can't remove the wrapping check, it can't remove the bounds check either, so two highly-predictable branches might not be materially worse than the already-existing one.)

That said, LLVM still can't fold sequential checked_adds (https://rust.godbolt.org/z/z7LUMt -- but it can saturating ones, thank you rust-lang/rust#58003!), so there's definitely low-hanging work here that could improve perf. (Maybe I should re-open rust-lang/rust#52203.)

As an upper-bound perf experiment, consider just changing the default in release to be panic, and we can run a try+perf on the experimental PR to see how far off things might be.

@briansmith
Copy link

briansmith commented Feb 14, 2019

ring: Near-miss integer overflow, which would have led to heap buffer overflow.

There was never such a bug in ring and the necessary checks are done there, so I don't think it makes sense to mention it here. (in_prefix_len > in_out.len() is a condition that is complete nonsense in that codebase.) If anything ring's track record is evidence that this RFC is unnecessary.

@briansmith
Copy link

CrosVM: Integer overflow leading to heap buffer overflow in ChromeOS's hypervisor.

According to the bug report "This didn't end up being an actual issue as it was covered by an early bounds check."

I didn't check the other two examples cited as motivation. Also, I am mostly supportive of this proposal. However, I think the citations suggesting non-vulnerabilities are vulnerabilities should be removed.

@alex
Copy link
Member Author

alex commented Feb 14, 2019

In Ring's case, that's why I described it as a "near miss" -- since it was unreachable, but the check wasn't near the overflow, and there was no type or other structure for ensuring the invariant. It was included because it demonstrated the potential pattern of overflow on usize + unsafe code. If you still think that's misleading, I can remove it.

In CrosVM's case, the overflow was probably unexploitable (at least, none of us figured out a way to), because of the size of the input data was bounded, not because there was any checking on the actual result of the arithmetic. This one definitely makes sense to include, in my view.

@briansmith
Copy link

I think in both cases, the code that prevents the overflow is far away from the code that it is protecting, and minimizing that distance is generally good.

I don't know all the details regarding the CrosVM case. In the case of ring, at least, it's not accidental that overflow is avoided. Bounding the input size very early to prevent integer overflows later is a pretty common practice. For example, this is one reason why mozilla::pkix (C++) limits its input sizes to 0xffff (uint16_t) or less. The untrusted library does something similar, but not as drastic. These libraries do things this way partly to minimize any potential performance impact; i.e. as an alternative to what's being proposed here.

Anyway, again, I'm not trying to argue against this RFC; just I think the way ring is referenced is very misleading.

@scottmcm
Copy link
Member

I didn't check the other two examples cited as motivation.

As the one who found the first one, it's definitely exactly this, and was fixed by making the overflow panic: https://github.com/rust-lang/rust/pull/54399/files#diff-773807740b9d7f176c85b4e2e34b2607R434

@tomwhoiscontrary
Copy link

There probably is performance differences on different architectures beyond the concern for those which don't have 2s complement arithmetic.

There is a proposal floating around in C++ space to declare that Signed Integers are Two’s Complement. In that proposal, there is a survey of signed integer representations which concludes that there are no current non-two's-complement hardware architectures. The closest thing is some Unisys product line which has an ASIC to support code written for its '60s mainframes, which were ones' complement (and had 36 bit words!).

So, i would suggest that it's safe to not consider non-two's complement machines at all.

@scottmcm scottmcm self-assigned this Feb 14, 2019
@joshtriplett
Copy link
Member

We discussed this briefly in the lang team meeting, and generally agreed with the thread on the following:

  • We should add a way to enable overflow-checks just for isize/usize and not other types.
  • Once we have that, we should get some data on the performance and security impact of using that option: does enabling overflow-checks only for isize/usize provide a substantial security/correctness improvement with a disproportionately low performance cost?
  • Given that data, we could decide what the default should be, but in the meantime people could opt into this new mode.

@steveklabnik
Copy link
Member

steveklabnik commented Feb 18, 2019 via email

@mark-i-m
Copy link
Member

Perhaps an unresolved question could be added to request a crater run?

alex added a commit to alex/rust that referenced this pull request Feb 23, 2019
This is a prototype for rust-lang/rfcs#2635 to enable us to collect some measurements.
bors added a commit to rust-lang/rust that referenced this pull request Feb 23, 2019
[WIP] Make usize overflow always have debug-assertions semantics

This is a prototype for rust-lang/rfcs#2635 to enable us to collect some measurements.
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 24, 2019

Team member @scottmcm has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Feb 24, 2019
@scottmcm
Copy link
Member

While I like this conceptually, it seems like current measurements of experimenting with this are showing about a 2-6% impact to the speed of the compiler. As such, I propose that we postpone considering changing the release default for now, pending further work.

@rfcbot fcp postpone

Process notes: I'm only proposing postponing the T-lang change here. I (personally) would absolutely like to see work here continue, as better optimizations around checked math will help even if we never change a default here (for example, in Range::nth). And if T-compiler decides that this is a useful option to expose for general opt-in use, I would be happy for that too. But AFAIK those aren't lang decisions.

@alex
Copy link
Member Author

alex commented Feb 24, 2019

First missed optimization I've found: rust-lang/rust#58692

@alex
Copy link
Member Author

alex commented Apr 23, 2019

Ok, tracking down optimizations and getting fixes upstreamed into LLVM is a slower process than I'd hoped.

In order to keep some momentum here, I'd like to propose a flag for rustc that implements the behavior described here; of doing panic for usize arithmetic but leaving other types alone. Does this require a full of RFC, or can it simply be pull requests to rustc?

@scottmcm
Copy link
Member

@alex AFAIK the compiler team decides what flags the compiler has, and an unstable flag is plausibly something that would be accepted, assuming the code maintenance cost is low.

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 24, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Apr 24, 2019
@the8472
Copy link
Member

the8472 commented Apr 25, 2019

Would it make sense to only add checks to either functions or modules that contain unsafe code? The heuristic here would be that bounds checks either happen in the unsafe code itself, the surrounding function or its sibling functions that encapsulate it into a safe interface?

@alex
Copy link
Member Author

alex commented Apr 25, 2019 via email

@the8472
Copy link
Member

the8472 commented Apr 25, 2019

Well, if your module publishes a function, even if it is just pub(crate), and it can corrupt memory unless the caller gets some checks right then it itself should be unsafe. On the other hand a crate-level pessimization seems overly coarse, e.g. if you have one set of heavy operations on slices in one module and the unsafe stuff segregated into another one. So module-level granularity feels right.

@fowl2
Copy link

fowl2 commented May 2, 2019

Could some of the performance anxiety be mitigated by using unchecked_add where profiles demanded it? Ahh #2508 suggests this could be unsafe, I guess arbitrary_but_safe_add doesn't quite roll off the keyboard.

Co-Authored-By: alex <alex.gaynor@gmail.com>
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label May 4, 2019
@rfcbot
Copy link
Collaborator

rfcbot commented May 4, 2019

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now postponed.

@rfcbot rfcbot added postponed RFCs that have been postponed and may be revisited at a later time. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels May 4, 2019
@rfcbot rfcbot closed this May 4, 2019
@alex
Copy link
Member Author

alex commented Jul 29, 2020

I have not forgotten about this, and continue to (as time allows) work on the missed-optimizations problem: rust-lang/rust#72549 is the current known blocker.

rust-lang/rust#72237 was brought to my attention as an example of another bug that would have been caught by this RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-arithmetic Arithmetic related proposals & ideas A-primitive Primitive types related proposals & ideas finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.