Parallel decoding #161

fintelia · 2022-04-16T21:06:17Z

This PR adds parallel decoding via an optional (but default) dependency on rayon.

HeroicKatora

I'm not sure if the task strategy is correct yet. It doesn't seem very 'rayon idiomatic'.

I'd be more comfortable with the task strategy if we allocate a fixed number of parallel resources (that is, prepare buffers for some determined number of parallelism that depends on the required buffer sizes vs. available allocation size vs. rayon's available core count). And have an iterator defining the chunks+target buffers that need processing.

If those task-resources are in a vec<_> then we can use flat_map_iter to actually execute the task. Each task could then create an 'iterator' that polls from the shared chunk iterator (by some means of synchronization). That would guarantee a precise upper bounds on the allocated buffer size while also preserving those allocations between chunks.

Can you imagine something like this working?

HeroicKatora · 2022-04-23T00:01:18Z

src/decoder/mod.rs

+            let condvar = std::sync::Condvar::new();
+            let bytes_allocated = std::sync::Mutex::new(Ok(0));


Somewhat questionable to me? Blocking a rayon worker seems very unintuitive, and after the jpeg-decoder experience there should be more comments on correctness/forward progress.

Added a bunch of comments. The general idea is that taking the lock should never deadlock because all threads are careful to never do any blocking operations while holding the lock. We do sleep on the condition variable from the calling thread, but never from one of rayon's worker threads. As long as the worker threads make forward progress the calling thread should eventually be woken up.

HeroicKatora · 2022-04-23T00:13:02Z

src/decoder/mod.rs

+            let condvar = std::sync::Condvar::new();
+            let bytes_allocated = std::sync::Mutex::new(Ok(0));
+            for x in 0..chunks_across {
+                rayon::in_place_scope_fifo(|s| -> TiffResult<()> {


We're already starting a new scope for each x coordinate, can't we ensure the allocation requirement by ensuring that at most n tasks run in parallel, where n is intermediate_buffer_size / chunk_bytes.iter().max(). Or an adaptive version instead.

I'd also be more comfortable if tasks were generated as an iterator instead of this somewhat complex case distinction of different loops, if that makes sense. In particular this makes it easier to transition to using some of the rayon parallel iterators in the future?

The two loops are needed to work around a borrow checker issue for tiled images (for stripped images the outer loop executes exactly one iteration).

It may be possible to replace the inner loop with an iterator that generates the tasks. However, I think that would be tricky to implement while still doing file reads and chunk expands in parallel

src/decoder/image.rs

fintelia · 2022-04-28T04:39:48Z

@HeroicKatora how do you feel about the current state? If you'd prefer we iterate more, I can split out the rayon parallelized expand_chunk function into a separate PR to let us merged the the other code movement/refactoring parts. That would reduce potential for merge conflicts with other concurrent PRs

HeroicKatora · 2022-04-28T06:24:08Z

Sounds good to split it up. Consolidating chunk is a very good and orthogonal change compared to the prior explicit line/tile differentiation.

fintelia · 2022-05-08T21:15:48Z

Ok, I've rebased this on the main branch so now this just includes the parallel decoding part.

I also looked into other ways of using rayon for decoding. Unfortunately, we're rather constrained by the lack of a Send bound on the reader. The in_place_scope[_fifo] methods are pretty much the only option. Even ParallelBridge requires a Send-able iterator.

fintelia requested a review from HeroicKatora April 22, 2022 23:56

HeroicKatora reviewed Apr 23, 2022

View reviewed changes

fintelia mentioned this pull request Apr 28, 2022

Added ability to decompress tiled jpeg files #162

Merged

fintelia mentioned this pull request May 3, 2022

Unify strip and tile decoding #164

Merged

AlanRace mentioned this pull request May 3, 2022

Add ability to select image directory and tile using index #163

Merged

fintelia added 3 commits May 8, 2022 13:27

Parallel decoding

a4f0aa5

Rename chunk_byte_range method, and add more comments

f349fff

typos

ab25fb6

fintelia force-pushed the parallel-decode branch from 7320c36 to ab25fb6 Compare May 8, 2022 20:28

fintelia closed this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel decoding #161

Parallel decoding #161

fintelia commented Apr 16, 2022 •

edited

Loading

HeroicKatora left a comment

HeroicKatora Apr 23, 2022

fintelia Apr 23, 2022 •

edited

Loading

HeroicKatora Apr 23, 2022

fintelia Apr 23, 2022 •

edited

Loading

fintelia commented Apr 28, 2022

HeroicKatora commented Apr 28, 2022

fintelia commented May 8, 2022 •

edited

Loading

		let condvar = std::sync::Condvar::new();
		let bytes_allocated = std::sync::Mutex::new(Ok(0));

Parallel decoding #161

Parallel decoding #161

Conversation

fintelia commented Apr 16, 2022 • edited Loading

HeroicKatora left a comment

Choose a reason for hiding this comment

HeroicKatora Apr 23, 2022

Choose a reason for hiding this comment

fintelia Apr 23, 2022 • edited Loading

Choose a reason for hiding this comment

HeroicKatora Apr 23, 2022

Choose a reason for hiding this comment

fintelia Apr 23, 2022 • edited Loading

Choose a reason for hiding this comment

fintelia commented Apr 28, 2022

HeroicKatora commented Apr 28, 2022

fintelia commented May 8, 2022 • edited Loading

fintelia commented Apr 16, 2022 •

edited

Loading

fintelia Apr 23, 2022 •

edited

Loading

fintelia Apr 23, 2022 •

edited

Loading

fintelia commented May 8, 2022 •

edited

Loading