Drop some `unsafe`s - the compiler now optimizes equivalent safe code #43

Shnatsel · 2018-06-23T00:37:10Z

This PR drops some unsafe code that was introduced in place of safe code as an optimization. It is no longer needed: on Rust 1.27 (current stable) performance degradation from this change is within measurement noise even when taking 10x the normal number of samples using Criterion.

cargo bench has 2% variance by itself, so it couldn't measure anything below that, so I had to switch to Criterion and then jack up the number of samples to make sure there is no degradation at all. You can find the benchmarking setup here. It's a bit messy, but I can clean it up and open a PR if you're interested.

…surement noise even when taking 10x the normal number of samples using Criterion.

Shnatsel · 2018-06-23T00:40:58Z

Travis check failed because rustup is currently broken for nightly, it fails to download the compiler. I have this on my local machine too.

…ith potentially uninitialized contents. I have tried and failed to actually exploit this to get an information leak.

eddyb · 2018-06-24T06:10:33Z

src/lib.rs

-                }
+
+            for i in self.pos as usize..pos_end as usize {
+            self.buffer[i] = self.buffer[i + forward as usize]


Indentation here is still too to the left. Also, the semicolon missing.

~~Yup, fixed in the very next commit. Didn't want to put unrelated things to the same commit.~~ Nevermind, it's a different line that looked similar. Thanks for the catch, and thanks for the review!

eddyb · 2018-06-24T06:13:33Z

src/lib.rs

+        if self.pos < dist && pos_end > self.pos {
+            return Err("invalid run length in stream".to_owned());
+        }
+
        if self.buffer.len() < pos_end as usize {
            unsafe {
                self.buffer.set_len(pos_end as usize);


I wonder if push is efficient enough, compared to assigning.
The capacity should be fixed, maybe just needs an assert!(pos_end as usize <= self.buffer.capacity()) to hint to LLVM.

That is, replacing if self.buffer.len() < pos_end as usize {...} (and the original loop) with:

for i in self.pos as usize..self.buffer.len().min(pos_end as usize) { self.buffer[i] = self.buffer[i - dist as usize]; } assert!(pos_end as usize <= self.buffer.capacity()); while self.buffer.len() < pos_end as usize { let x = self.buffer[self.buffer.len() - dist as usize]; self.buffer.push(x); }

One interesting question would be whether we are even validating that dist != 0.

I am still iterating on this part, so I didn't include it in this PR. I will try this and report the performance impact. Thanks!

I'm also going to try extend_from_slice just to see how that works. Probably poorly because the slice belongs to the same buffer, and since the vector might be reallocated the slice could be invalidated, so I'd have to clone it. Now if I had a fixed-size vector that would be guaranteed not to reallocate that might have been faster than push().

extend_from_slice is not really relevant, because this is RLE, not just "copy from the past", so if dist > len, you start repeating by reading bytes you've just written.
Also note that length of the vector is only increased once, then the sliding window stays constantly sized, so the push loop is less performance-sensitive.

You'd need a different loop structure with something like memcpy inside, and skipping dist-long runs, if you wanted to actually take advantage of distances larger than a few bytes, but I have my suspicions that it will be significantly faster.

Sadly, the code you've suggested incurs a 9-11% performance penalty on decompressing entire files, depending on the file. I have tweaked it a bit and got it to incur 8-10% penalty, here's the code:

let upper_bound = self.buffer.len().min(pos_end as usize); for i in self.pos as usize..upper_bound { self.buffer[i] = self.buffer[i - dist as usize]; } assert!(pos_end as usize <= self.buffer.capacity()); let initial_buffer_len = self.buffer.len(); for i in initial_buffer_len..pos_end as usize { let x = self.buffer[i - dist as usize]; self.buffer.push(x); }

Presence or absence of assert() has no effect (in this code, I haven't tested the variant with while without assert).

I also got 10% performance overhead simply by replacing unsafe {self.buffer.set_len(pos_end as usize);} with self.buffer.resize(pos_end as usize, 0u8); in the original code, which results in a slightly more concise code than your solution.

Thank you for investigating! This is probably the single best response to me filing a security issue I've ever seen, and I didn't even have a proof of concept this time.

As for the commit message - function run_len_dist() when taken in isolation is vulnerable; the rest of the code just so happens to never call it with dist set to 0, so the crate as a whole is not vulnerable. I would prefer to note that in the commit history, and update only the "It is unclear..." part to reflect that the crate as a whole is not vulnerable. But I will drop the mention of the vulnerability if you insist.

Also, here's my memcpy-based prototype.

if self.buffer.len() < pos_end as usize { self.buffer.resize(pos_end as usize, 0u8); } fill_slice_with_subslice(&mut self.buffer, (self.pos as usize - dist as usize, self.pos as usize), (self.pos as usize, pos_end as usize)); fn fill_slice_with_subslice(slice: &mut[u8], (source_from, source_to): (usize, usize), (dest_from, dest_to): (usize, usize)) { let (source, destination) = if dest_from >= source_from {slice.split_at_mut(dest_from)} else {slice.split_at_mut(source_from)}; let source = &mut source[source_from..source_to]; let destination = &mut destination[..dest_to-dest_from]; for i in (0..( (destination.len()) / source.len() )).map(|x| x * source.len()) { destination[i..source.len()+i].copy_from_slice(&source); } }

It fails some tests and I have been trying to understand why to no avail, so I'm afraid I won't be able to complete it.

I'm afraid I'm not familiar with assembler or LLVM IR, so I will not be able to inspect it in any meaningful way. Sorry. I will benchmark your suggested changes with panic=abort on nightly and report the results.

Note that I don't mean panic=abort, there's a "nightly" feature which changes what abort() does. Although panic=abort might itself cause further improvements, so there's probably a few combinations to test.

I wouldn't have done the copy_from_slice with a for loop, or at least not with division/multiplication - there's away to step ranges, although I'd also try a while loop that updates state instead.

But anyway, dist is not a multiple of len, you're missing something to handle len % dist (again, it should ideally be done without using %, unless you can find a way to use division that is a performance boost, but I doubt that's possible here).

I've conjured up a function based on copy_from_slice() that is more efficient than a per-element loop. Using while made it a lot more readable, so thanks for the tip!

if self.buffer.len() < pos_end as usize { self.buffer.resize(pos_end as usize, 0u8); } fill_slice_with_subslice(&mut self.buffer, (self.pos as usize - dist as usize, self.pos as usize), (self.pos as usize, pos_end as usize)); fn fill_slice_with_subslice(slice: &mut[u8], (source_from, source_to): (usize, usize), (dest_from, dest_to): (usize, usize)) { let (source, destination) = slice.split_at_mut(dest_from); //TODO: allow destination to be lower than source let source = &source[source_from..source_to]; let destination = &mut destination[..(dest_to - dest_from)]; let mut offset = 0; while offset + source.len() < destination.len() { destination[offset..source.len()+offset].copy_from_slice(&source); offset += source.len(); } let remaining_chunk = destination.len()-offset; &mut destination[offset..].copy_from_slice(&source[..remaining_chunk]); }

It offsets some of the costs of safe memory initialization so that switching to safe initialization would only create 5% overhead instead of 10%. If we switch the loop above to something like this too, then we'd have an entirely safe crate with the same performance as before.

I have a branch with all changes from this PR plus the optimized loop: https://github.com/Shnatsel/inflate/tree/safe-with-optimized-loop

Sadly, this function still fails one test - the line with its invocation causes an interger overflow on test "issue_30_realworld", i.e. self.pos is actually less than dist. How did it work in a simple loop that did self.buffer[i - dist as usize] with i initialized to self.pos but doesn't work here is beyond me.

I still think this code could be much simpler if it was duplicated between forward and backward, just like the old code - in fact, it would be very similar to the old code, just doing more than one element at a time.

Also, is there a benefit to always using the same source subslice, or is it just as efficient / more efficient to always copy the last dist bytes? There may be some cache effects here.

Always copying the last dist bytes seems to be exactly as efficient as aways copying the same slice.

The code I ended up with looks like this:

let (source, destination) = (&mut self.buffer).split_at_mut(self.pos as usize); let source = &source[source.len() - dist as usize..]; let mut offset = 0; while offset + source.len() < destination.len() { destination[offset..source.len()+offset].copy_from_slice(&source); offset += source.len(); } let remaining_chunk = destination.len()-offset; &mut destination[offset..].copy_from_slice(&source[..remaining_chunk]);

Which is a bit more readable. This nets 3% to 7% performance improvement.

Surprisingly, putting the same code in the other copying loop in this function actually hurts performance by 1% on my samples.

I've also tried an iterator-based version, which is concise but as slow as copying byte-by-byte:

let (source, destination) = (&mut self.buffer).split_at_mut(self.pos as usize); let source = &source[source.len() - dist as usize..]; for (d,s) in destination.chunks_mut(dist as usize).zip(source.chunks(dist as usize).cycle()) { let d_len = d.len(); // last chunk has a size lower than we've specified d.copy_from_slice(&s[..d_len]); }

However, I've realized that this function can return pretty much any garbage and tests won't fail. Since this creates regression potential, optimizing copying falls out of scope of this PR.

eddyb · 2018-06-24T07:26:07Z

src/lib.rs

-                }
+
+            for i in self.pos as usize..pos_end as usize {
+                self.buffer[i] = self.buffer[i + forward as usize]


Semicolon still missing.

eddyb · 2018-06-25T04:22:03Z

Btw, there is still the use of the abort intrinsic, could that be removed perhaps?
Or does unwinding still affect benchmarks?

Also, I hope both #35 and this were tested with both the "nightly" feature and without. It's possible that if unwinding prevents optimizations, changes that don't affect benchmarks with unwinding actually do affect benchmarks without. cc @RazrFalcon

…re vulnerability in this function. This is not exploitable in practice, because run_len_dist() is not exposes in the public API and is never called with dist=0 from this crate, even given deliberately malformed inputs.

eddyb · 2018-06-25T05:41:39Z

src/lib.rs

@@ -65,7 +65,7 @@
 //! }
 //! ```

-#![cfg_attr(feature = "unstable", feature(core))]
+#![cfg_attr(feature = "unstable", feature(core_intrinsics))]


Wow, for how long has this been broken?

TL;DR: dunno.
The only relevant changelog entry I could find was "Declaration of lang items and intrinsics are now feature-gated by default." in 0.11.0 (2014-07-02), but that's probably about feature(core) not feature(core_intrinsics). I could not find anything about core_intrinsics on Rust bug tracker.

Shnatsel · 2018-06-25T08:03:25Z

Current code with unsafe:

rustc 1.27 stable: baseline
rustc 1.28.0-nightly (60efbdead 2018-06-23): +3% regression
"unstable" feature, rustc 1.28.0-nightly (60efbdead 2018-06-23): +3% regression

Safe code with while suggested by @eddyb:

rustc 1.27 stable: +11% regression
#[inline(never)], rustc 1.27 stable: +12% regression
rustc 1.28.0-nightly (60efbdead 2018-06-23): +13% regression
#[inline(never)], rustc 1.28.0-nightly (60efbdead 2018-06-23): +13% regression
"unstable" feature, assert() rewritten as if .. {abort()} rustc 1.28.0-nightly (60efbdead 2018-06-23): +13% regression

I could not benchmark panic=abort because Criterion dependency pest_derive fails to build with "error: the linked panic runtime panic_unwind is not compiled with this crate's panic strategy abort", and 3% differences are not observable with simpler tools.

Apparently the abort intrinsic is no longer needed. Shall I drop it in this PR?

eddyb · 2018-06-25T08:07:39Z

Apparently the abort intrinsic is no longer needed. Shall I drop it in this PR?

Sounds good!

… compiler. Replace its use with more idiomatic assertions.

Shnatsel · 2018-06-25T08:25:07Z

Done. I hope removing a feature from Cargo.toml is not a breaking change?

hauleth · 2018-06-25T15:44:04Z

src/lib.rs

@@ -404,7 +386,7 @@ impl CodeLengthReader {
                        self.result.push(0);
                    }
                }
-                _ => abort(),
+                _ => panic!(),


Wouldn't it be better to use unreachable!() macro? It would provide more meaningful name.

eddyb · 2018-06-25T19:30:21Z

Cargo.toml

@@ -9,7 +9,6 @@ keywords = ["deflate", "decompression", "compression", "piston"]

 [features]
 default = []


Is this needed at all? I think all it did was disable unstable by default.

eddyb · 2018-06-25T19:35:35Z

@Shnatsel I don't know about breaking change. cc @bvssvni

bvssvni · 2018-06-25T19:46:46Z

I think removing a feature is a breaking change. Bump it to 0.5.0.

eddyb · 2018-06-25T19:51:21Z

@bvssvni IMO we should avoid breaking changes, to ensure that any improvements are used in the ecosystem. But YMMV.

bvssvni · 2018-06-25T21:08:55Z

@eddyb It's always better to release a new version than messing up existing projects.

This reverts commit dd20d3e.

…enalty

Shnatsel · 2018-06-26T02:28:25Z

I would like the safety improvements done so far not to warrant a version bump so that they would naturally propagate through the ecosystem. I'm on board with reverting the switch from abort() to assertions if that's what's needed to make it happen.

On the other hand, "unstable" feature was only available on nightly where breakage is kind of expected, and it was already broken to boot, so all people notice would be that it code with that feature enabled has actually started compiling.

IDK. Your call.

bvssvni · 2018-06-26T11:34:21Z

OK, if it was broken then we can bump it to 0.4.3.

…ly, this actually *improves* performance slightly.

…rt below for it anyway.

Shnatsel · 2018-06-26T18:35:23Z

I haven't managed to get rid of the unsafe block without losing performance, so instead I've covered it in asserts. Asserts actually improve performance by roughly 2%, depending on the workload.

I am actually in the market for an entirely safe inflate, perhaps even one with #![forbid(unsafe_code)], but coercing llvm into optimizing safe code to the same extent has proven to be beyond my abilities.

This PR is now as good as I can get it. Please conduct final review and merge it or request changes.

bvssvni · 2018-06-26T22:37:00Z

Merging.

bvssvni · 2018-06-26T22:41:19Z

@eddyb Do you want to test your access to publish? Or do you want me to do it?

Shnatsel · 2018-06-27T04:18:43Z

Thank you!

I'll probably work on png crate next. My fuzzers have found a panic on malformed input in it, I'll fix that to unblock further fuzzing. Plus it's got unsafes, perhaps I can do something about that.

Drop unsafe loops. On Rust 1.27 performance degradation is within mea…

1a810db

…surement noise even when taking 10x the normal number of samples using Criterion.

Shnatsel added 2 commits June 24, 2018 06:25

Exit earlier on invalid run length, without leaving behind a buffer w…

488db57

…ith potentially uninitialized contents. I have tried and failed to actually exploit this to get an information leak.

Fix formatting

a5e29c8

eddyb reviewed Jun 24, 2018

View reviewed changes

Fix formatting in one more place

bc3916d

eddyb reviewed Jun 24, 2018

View reviewed changes

Add semicolon

3715c4c

Shnatsel force-pushed the master branch from c250e50 to 11c73f2 Compare June 25, 2018 05:33

::std::intrinsics now requires 'core_intrinsics' feature, not 'core'

8f22601

eddyb reviewed Jun 25, 2018

View reviewed changes

Shnatsel added 2 commits June 25, 2018 11:24

Remove use of 'abort' intrinsic now that panic!() is optimized by the…

4e5cc94

… compiler. Replace its use with more idiomatic assertions.

Remove 'unstable' feature from Cargo.toml

dd20d3e

hauleth reviewed Jun 25, 2018

View reviewed changes

Replace panic!() with unreachable!(). Thank you @hauleth for the advice

d56abee

eddyb reviewed Jun 25, 2018

View reviewed changes

Shnatsel added 2 commits June 26, 2018 04:27

Revert "Remove 'unstable' feature from Cargo.toml"

61c2c1c

This reverts commit dd20d3e.

Add an assert() before the last remaining unsafe {}; no performance p…

d476136

…enalty

Add another assert against potentially reading from uninitialized memory

fdee4de

Shnatsel added 5 commits June 26, 2018 20:51

Add a comment explaining why if dist < 1 check is there

9c8f582

Add an assert for dist correctness right before it's used. Surprising…

a2955c8

…ly, this actually *improves* performance slightly.

More comments warning the unwary about the one and only unsafe block

f4418ea

Drop a wall of text in a comment, it is not helpful. There is an asse…

baf9c7e

…rt below for it anyway.

Explain what each assert actually does

a36a517

bvssvni merged commit 728bd49 into image-rs:master Jun 26, 2018

Shnatsel mentioned this pull request Jun 27, 2018

Missing documentation rustsec/advisory-db#25

Closed

Shnatsel mentioned this pull request Dec 31, 2022

Inflate: copy_rep_matches can be faster etemesi254/zune-image#41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop some `unsafe`s - the compiler now optimizes equivalent safe code #43

Drop some `unsafe`s - the compiler now optimizes equivalent safe code #43

Shnatsel commented Jun 23, 2018

Shnatsel commented Jun 23, 2018

eddyb Jun 24, 2018

Shnatsel Jun 24, 2018 •

edited

Loading

eddyb Jun 24, 2018

eddyb Jun 24, 2018

Shnatsel Jun 24, 2018

eddyb Jun 24, 2018

Shnatsel Jun 24, 2018

Shnatsel Jun 25, 2018

eddyb Jun 25, 2018

Shnatsel Jun 25, 2018

eddyb Jun 25, 2018

Shnatsel Jun 26, 2018

eddyb Jun 24, 2018

eddyb commented Jun 25, 2018 •

edited

Loading

eddyb Jun 25, 2018

Shnatsel Jun 25, 2018

Shnatsel commented Jun 25, 2018

eddyb commented Jun 25, 2018

Shnatsel commented Jun 25, 2018

hauleth Jun 25, 2018

eddyb Jun 25, 2018

eddyb commented Jun 25, 2018

bvssvni commented Jun 25, 2018

eddyb commented Jun 25, 2018

bvssvni commented Jun 25, 2018

Shnatsel commented Jun 26, 2018

bvssvni commented Jun 26, 2018

Shnatsel commented Jun 26, 2018

bvssvni commented Jun 26, 2018

bvssvni commented Jun 26, 2018

Shnatsel commented Jun 27, 2018 •

edited

Loading

		@@ -9,7 +9,6 @@ keywords = ["deflate", "decompression", "compression", "piston"]

		[features]
		default = []

Drop some unsafes - the compiler now optimizes equivalent safe code #43

Drop some unsafes - the compiler now optimizes equivalent safe code #43

Conversation

Shnatsel commented Jun 23, 2018

Shnatsel commented Jun 23, 2018

Choose a reason for hiding this comment

Shnatsel Jun 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb commented Jun 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shnatsel commented Jun 25, 2018

eddyb commented Jun 25, 2018

Shnatsel commented Jun 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb commented Jun 25, 2018

bvssvni commented Jun 25, 2018

eddyb commented Jun 25, 2018

bvssvni commented Jun 25, 2018

Shnatsel commented Jun 26, 2018

bvssvni commented Jun 26, 2018

Shnatsel commented Jun 26, 2018

bvssvni commented Jun 26, 2018

bvssvni commented Jun 26, 2018

Shnatsel commented Jun 27, 2018 • edited Loading

Drop some `unsafe`s - the compiler now optimizes equivalent safe code #43

Drop some `unsafe`s - the compiler now optimizes equivalent safe code #43

Shnatsel Jun 24, 2018 •

edited

Loading

eddyb commented Jun 25, 2018 •

edited

Loading

Shnatsel commented Jun 27, 2018 •

edited

Loading