Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pixels method to ImageDecoder #861

Closed
birktj opened this issue Feb 4, 2019 · 10 comments
Closed

Add pixels method to ImageDecoder #861

birktj opened this issue Feb 4, 2019 · 10 comments
Labels
discussion draft Add if this issue includes suggested code, compares interfaces, preparses wording,etc kind: enhancement

Comments

@birktj
Copy link
Member

birktj commented Feb 4, 2019

When working with the ImageDecoder trait you only get a raw byte buffer or a raw byte stream. If you want to work with pixels you either have to use ImageBuffer and friends or manually convert the byte buffer/stream to pixels. However sometimes the image may not fit in memory, but you would like to work with pixels. For this usecase i propose to add a pixels method to the ImageDecoder trait, it may look something like this:

trait ImageDecoder {
    // ...
    fn pixels(self) -> DecoderPixels<Self::Reader> {
        let colortype = self.colortype();
        let buffer = vec![0; colortype.bytes()];

        DecoderPixels {
            reader: self.into_reader(),
            colortype,
            buffer,
        }
    }
}

struct DecoderPixels<R> {
    reader: R,
    colortype: ColorType,
    buffer: Vec<u8>,
}

impl<R: Read> Iterator for DecoderPixels<R> {
    type Item = Box<dyn Pixels>; // Maybe Result<Box<dyn Pixels>> instead?

    fn next(&mut self) -> Option<Self::Item> {
        self.reader.read_exact(&mut self.buffer).ok()?;
        Some(self.colortype.pixel(&self.buffer))
    }
}


impl ColorType {
    fn bytes(&self) -> u8 {
        // ...
    }

    fn pixel(&self, slice: &[u8]) -> Box<dyn Pixel> {
        // ...
    }
}
@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 4, 2019

Streaming image decoding is one of the most desirable items missing, indeed. But the proposed interface appears flawed in several ways:

  • Item=Box<dyn Pixel> requires an allocation
  • You'd have the choice of 'efficient bulk operation of all pixels' and 'extremely inefficient single pixel operations' with no intermediate.
  • Iterator can't return a reference to its own memory (would require HRTBs). The code as proposed couldn't work.
  • Must hide some buffer within the decoder, of arbitrary/invisible/undetermined size. For some decoders, this will therefore not solve the problem of exhausting memory at all, it will just hide the initial memory consumption by not appearing to allocate the full image upfront. Few image formats (especially the compressing ones) will be able to provide an actual pixel-by-pixel iterator. E.g. think of a reader for a planar image format. In the current api, it can only yield a pixel by reading at least until the last sample of the first pixel. That would inherently already exhaust most of the required memory by reading the other samples planes completely, or incurr the overhead of constantly seeking in the provided storage (and requiring io::Seek in the first place, so we couldn't purely stream decode anymore).

That doesn't mean the interface can't work. But in the current state, I'd rather start with specialized implementations to proof that this can satisfy the usability criterions of performance and memory requirements before adding it as a trait method.

@fintelia
Copy link
Contributor

fintelia commented Feb 4, 2019

I'd also be interested in understanding better what use cases would benefit from streaming decoding over calling read_rect to load part of the image at a time. Especially because there are formats (like planar images) where the latter can be implemented efficiently but the former can't be.

@HeroicKatora HeroicKatora added kind: enhancement discussion draft Add if this issue includes suggested code, compares interfaces, preparses wording,etc labels Feb 4, 2019
@birktj
Copy link
Member Author

birktj commented Feb 4, 2019

It seems like generic streaming decoding (and encoding) is a really hard problem because of different ways different formats store pixels. I would guess the best solution would be some sort of ImageStream trait with methods like map_pixels, scale, etc... optimized for the different formats. However this seems like both out of scope for this library and probably hard to extend.

@HeroicKatora I do agree that my proposal isn't prefect, but it should work as a starting point for a discussion. For the Box problem a better solution is probably a DynamicPixel enum. My idea was that ColorType::pixel would clone the contents of the slice so no references would be necessary. I guess you could never do any form of pure stream decoding with planar images, but I know that you can with pngs and probably some other formats.

@fintelia for pngs having a pixel iterator would be much more efficient than calling read_rect and you could e.g. process large images while downloading without writing to disk.

I will try to create some benchmarks for low memory decoding and encoding with something that looks like this and see if I can manage to get any satisfactory results.

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 4, 2019

@fintelia Oh yes, definitely. That reminds me, GenericImage theoretically includes not only width and height but also 'lower bounds', such as used by views. And in its current state, the translation of Decoder to DynamicImage is an internal function only. We could instead make it possible to read_into any existing image (checking color type and bounds before). Then a offset view (like a view but the outside is not necessarily backed by memory; name subject to change) could handle some of the complexity of read_rect and we'd require less intermediate allocations for reading multiple images sequentially.

@birktj
Copy link
Member Author

birktj commented Feb 4, 2019

@HeroicKatora I have created a simple benchmark here. It tests an iterator like described with the incremental png loading in #860.

It simply sums up the r, g and b components for each pixel, using either ImageBuffer::pixels or one of two streaming iterators.

The streaming iterators are slightly slower, however they use far less memory (0.5mb vs 680mb). I think that this shows that something in the lines of what proposed could be used to process some images in a streaming fashion with minimal memory usage.

@HeroicKatora
Copy link
Member

HeroicKatora commented Feb 4, 2019

@birktj Png is the best example for your case to choose here because decoding is highly non-trivial. I expect all of the timing here to be dominated by the deflate time, and the compuation overhead for actually summing pixels vanishes to some degree within that. On the one hand, that could be interpreted as a plus, on the other hand this is specific to this use case of streamed processing. The test implementation looks very nice though and the memory win is definitely intriguing–and as expected. I'd prefer not creating a DynamicPixel enum though, for the same reasons we are considering restructing DynamicImage more or less, see #856 . Which means, I'd like to have a look at the more general implementation of combining ViewMut with `Decoder´ before making a decision here.

Edit: Don't know if I can do it this week, definitely next.

@fintelia
Copy link
Contributor

fintelia commented Feb 8, 2019

Another approach would be to add a required constructor to the Pixel trait to convert from bytes in some colortype to that pixel type. Then DecoderPixels could avoid allocating boxes, and the user would have an easier time because they wouldn't have to match over possible pixel formats:

trait Pixel {
    fn convert_from_bytes(slice: &[u8], colortype: ColorType) -> P;
}

struct DecoderPixels<R, P: Pixel> {
    reader: R,
    colortype: ColorType,
    buffer: Vec<u8>,
}
impl<R: Read, P: Pixel> Iterator for DecoderPixels<R> {
    type Item = Self::P;

    fn next(&mut self) -> Option<Self::Item> {
        self.reader.read_exact(&mut self.buffer).ok()?;
        Some(P::convert_from_bytes(&self.buffer, self.colortype))
    }
}

@birktj
Copy link
Member Author

birktj commented Feb 15, 2019

I have been looking a bit into incremental image processing and I think maybe some sort of chunks method would be better than a pixels method. Decoders could choose and optimal size for their chunks to balance time and memory usage. Each chunk would be some sort of view and you could run image processing functions on the chunks independently and in the end either encode to memory or to file.

I guess this is sort of what you are thinking of when you say "combine view with decoder".

I envision some sort of ChunkedImage trait that looks sort of like this;

trait ChunkedImage {
     type Pixel: Pixel;
    
     fn width() -> u32;
     fn height() -> u32;

     fn map_chunks<F: Fn(View) -> Image>(self, f: F) -> MapChunkedImage<Self, F>;
     fn pixels()...
     fn resize() ...
     ...
}

@ambihelical
Copy link

@fintelia

I'd also be interested in understanding better what use cases would benefit from streaming decoding over calling read_rect to load part of the image at a time. Especially because there are formats (like planar images) where the latter can be implemented efficiently but the former can't be.

In my case, I'm interested in converting images on the fly into a different format (a private project I'm working on, I can't elaborate). Currently my only option is to decode the image into memory and then process it line by line, dropping the image when I'm done. This not only uses a lot of unnecessary memory, and but is very CPU cache unfriendly, and so is slower than it could be.

I could load the image in sub-rectangles as you describe, but that doesn't sound like it would improve performance any over loading once and processing, but I may try it. One other use that I can think of that would not benefit from the sub-image idea is speeding up thumbnail image generation, scaling the image as it is decoded should be a performance and memory use win.

I do think providing the decoded image data in chunks as @birktj suggests (in concept anyway) is the way to go for performance reasons. This is how some C++ code I created in the past operated, it worked well.

This is all optimization anyway, so if some formats (i.e. planar types) do not support incremental decoding, I'd rather just fall back to the line-by-line method than not have this functionality at all.

@HeroicKatora
Copy link
Member

@ambihelical It sounds to me like your usecase is extremely more specialized than the generic interface that ImageDecoder is supposed to offer. Because the amount of control it requires is the opposite of a simple interface that's supposed to 'just load an image'. Maybe it should not be as specialized?

This boils down to: Do the basic work first, integrate the functionality to offer this amount of control into as many format specific crates as possible. You could use the specialized versions first, which would provide feedback on the chosen interface, and in the worst case have the wrapper yourself. The first crate here is probably png, since the underlying file format basically works exactly as described. I would be surprised if we couldn't make it work here and the basics are already in-place afaik. Since you specifically think @birktj has made a good suggestion, the streaming decoder has been merged.

It's now up to determining the common denominator for other formats as well and build an abstraction. My biggest potential doubts with the current version, and the largest difference compared to std::io::BufRead maybe, is the (lack of) strong typing for the pixel format. This is, from a theoretical standpoint of memory safety, only a real problem if you eventually want to read f32 or other non-integer pixels since those do not have the guarantee that any byte-representation is valid.

@fintelia fintelia closed this as not planned Won't fix, can't repro, duplicate, stale Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion draft Add if this issue includes suggested code, compares interfaces, preparses wording,etc kind: enhancement
Projects
None yet
Development

No branches or pull requests

4 participants