Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel decoding #161

Closed
wants to merge 3 commits into from
Closed

Conversation

fintelia
Copy link
Contributor

@fintelia fintelia commented Apr 16, 2022

This PR adds parallel decoding via an optional (but default) dependency on rayon.

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the task strategy is correct yet. It doesn't seem very 'rayon idiomatic'.

I'd be more comfortable with the task strategy if we allocate a fixed number of parallel resources (that is, prepare buffers for some determined number of parallelism that depends on the required buffer sizes vs. available allocation size vs. rayon's available core count). And have an iterator defining the chunks+target buffers that need processing.

If those task-resources are in a vec<_> then we can use flat_map_iter to actually execute the task. Each task could then create an 'iterator' that polls from the shared chunk iterator (by some means of synchronization). That would guarantee a precise upper bounds on the allocated buffer size while also preserving those allocations between chunks.

Can you imagine something like this working?

Comment on lines +1277 to +1223
let condvar = std::sync::Condvar::new();
let bytes_allocated = std::sync::Mutex::new(Ok(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat questionable to me? Blocking a rayon worker seems very unintuitive, and after the jpeg-decoder experience there should be more comments on correctness/forward progress.

Copy link
Contributor Author

@fintelia fintelia Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of comments. The general idea is that taking the lock should never deadlock because all threads are careful to never do any blocking operations while holding the lock. We do sleep on the condition variable from the calling thread, but never from one of rayon's worker threads. As long as the worker threads make forward progress the calling thread should eventually be woken up.

let condvar = std::sync::Condvar::new();
let bytes_allocated = std::sync::Mutex::new(Ok(0));
for x in 0..chunks_across {
rayon::in_place_scope_fifo(|s| -> TiffResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already starting a new scope for each x coordinate, can't we ensure the allocation requirement by ensuring that at most n tasks run in parallel, where n is intermediate_buffer_size / chunk_bytes.iter().max(). Or an adaptive version instead.

I'd also be more comfortable if tasks were generated as an iterator instead of this somewhat complex case distinction of different loops, if that makes sense. In particular this makes it easier to transition to using some of the rayon parallel iterators in the future?

Copy link
Contributor Author

@fintelia fintelia Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two loops are needed to work around a borrow checker issue for tiled images (for stripped images the outer loop executes exactly one iteration).

It may be possible to replace the inner loop with an iterator that generates the tasks. However, I think that would be tricky to implement while still doing file reads and chunk expands in parallel

src/decoder/image.rs Outdated Show resolved Hide resolved
@fintelia
Copy link
Contributor Author

@HeroicKatora how do you feel about the current state? If you'd prefer we iterate more, I can split out the rayon parallelized expand_chunk function into a separate PR to let us merged the the other code movement/refactoring parts. That would reduce potential for merge conflicts with other concurrent PRs

@HeroicKatora
Copy link
Member

Sounds good to split it up. Consolidating chunk is a very good and orthogonal change compared to the prior explicit line/tile differentiation.

@fintelia
Copy link
Contributor Author

fintelia commented May 8, 2022

Ok, I've rebased this on the main branch so now this just includes the parallel decoding part.

I also looked into other ways of using rayon for decoding. Unfortunately, we're rather constrained by the lack of a Send bound on the reader. The in_place_scope[_fifo] methods are pretty much the only option. Even ParallelBridge requires a Send-able iterator.

@fintelia fintelia closed this Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants