Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library wide memory limits #938

Closed
birktj opened this issue May 5, 2019 · 18 comments
Closed

Library wide memory limits #938

birktj opened this issue May 5, 2019 · 18 comments
Labels
discussion draft Add if this issue includes suggested code, compares interfaces, preparses wording,etc kind: enhancement medium topic: meta data Towards dealing with meta data

Comments

@birktj
Copy link
Member

birktj commented May 5, 2019

By now most decoder libraries have some sort of Limits struct or sanity checks that stops them from trying to decode images that would make them run out of memory. However this library doesn't have anything like it that could be used to configure these limits of the individual decoders. We should also use them to set a limit to the size of ImageBuffer and alike, with image-rs/image-png#115 for example the png decoder will happily decode an image that is 100 000 x 100 000 pixels large as long as it doesn't use to much memory. In these cases we should then have a second check that we are not allocating a ImageBuffer of that size.

@birktj birktj added discussion draft Add if this issue includes suggested code, compares interfaces, preparses wording,etc kind: enhancement medium labels May 5, 2019
@Shnatsel
Copy link
Contributor

TIFF now also implements memory limit based on the maximum buffer size, just like png crate:
https://github.com/image-rs/image-tiff/blob/f6a91065d937fac56e2b70dae90da20e40fa46c1/src/decoder/mod.rs#L125-L136

@HeroicKatora
Copy link
Member

HeroicKatora commented Aug 31, 2019

And y4m also adopted one based on buffer size here (even though that's less relevant for images right now).

@fintelia
Copy link
Contributor

fintelia commented Sep 1, 2019

There is also the ImageDecoder::total_bytes method, which is implemented for all decoders. That doesn't count intermediate allocations done while decoding an image, but those should generally be quite small. Perhaps that method could be used to have the new Reader struct take an optional memory limit on the size of images to load?

@HeroicKatora
Copy link
Member

HeroicKatora commented Oct 2, 2019

With the new io::Reader interface we could also support this in a semi-opaque way by adding a method to set a limit. This could have three possible internal states:

  • Do not enforce limits (default for now)
  • Enforce some useful default limits if the format supports limits (default in the next major version)
  • Enforce exact custom limits, error if none are supported
impl<R> Reader<R> {
    pub fn no_limits(&mut self) { }
    pub fn smart_limits(&mut self) { }
    pub fn set_limits(&mut self, _: Limits) { }
}

@fintelia
Copy link
Contributor

fintelia commented Oct 2, 2019

I posted something very similar in the other thread. If we want to have limits on the decoder::dimensions or decoder::total_bytes then we wouldn't need any special buy in from implementations. I don't think any of them do large allocations at creation time, only when actually directed to decode the image so we can query metadata without risking OOMs.

Ideally the limits should be in terms of total memory used, but the amount of memory used for the output seems like a reasonably compromise for now: for reasonably sized images the allocations required are only at most 2x (or 3x?) the total output size.

@birktj
Copy link
Member Author

birktj commented Oct 1, 2020

I'm going to have a second try on this. I think I have found a reasonably extendable approach that should work nicely.

I propose to add the following struct:

impl Limits {
    /// Decode without any limits on resource usage.
    pub fn no_limits() -> Limits { }

    /// Set a relaxed limit on the number of bytes that may be allocated during decoding. This is a
    /// "best effort" limit which may be exceeded in some cases.
    pub fn set_max_bytes_relaxed(&mut self, max_bytes: u64) { }

    /// Set a strict limit on the number of bytes that may be allocated during decoding. If the
    /// decoder cannot guarantee to uphold this limit then decoding will fail.
    pub fn set_max_bytes_strict(&mut self, max_bytes: u64) { }

    // Easy to add other limits in the future.
}

And extend the Reader struct with the following methods, I think the default for reader should be some reasonabl default set of limits (currently relaxed).

impl<R> Reader<R> {
    /// Disable all limits.
    pub fn no_limits(&mut self) { }

    /// Set resource limits for decoding.
    pub fn set_limits(&mut self, limits: Limits) { }
}

And extend the ImageDecoder trait with the following method with a default implementation:

impl ImageDecoder {
    /// Set limits for this decoder.
    fn set_limits(&mut self, limits: Limits) -> ImageResult<()> {
        if limits.max_bytes.is_strict() {
            return Err(ImageError::Limits(error::LimitError::from_kind(
                error::LimitErrorKind::CannotGuarantee)))
        }

        if let Some(max_bytes) = limits.max_bytes.get() {
            if max_bytes < self.total_bytes() {
                return Err(ImageError::InsufficientMemory)
            }
        }

        Ok(())
    }
}

@Shnatsel
Copy link
Contributor

Shnatsel commented Oct 1, 2020

I'm not convinced that decoding failing at runtime is a good idea for strict limits. I'd rather have the strict limits method not be implemented on those decoders.

Speaking of relaxed and strict limits, I think it makes sense to separately limit the maximum size of the final image (which is easy to enforce for all decoders) and the limit on the RAM allocated in addition to storing the final image, which some decoders might not support.

@birktj
Copy link
Member Author

birktj commented Oct 1, 2020

For it to work together with io::Reader it would have to be a runtime check, right?

The idea is that it should be easy to add different kinds of fine-grained limits in the future. However your idea with separating the total image size and other potential allocations sounds like a nice starting point. The maximum image size doesn't even need to have a strict/relaxed type bound.

@Shnatsel
Copy link
Contributor

Shnatsel commented Oct 1, 2020

Yes, I suppose we would need a runtime check.

Total size of the resulting image is relatively easy to enforce, so I'd start with adding that to everything and then see if we need any limits that are more specialized. If we do, we can add them on a per-decoder basis.

Also, I wonder how this would interact with multi-frame images, such as animated GIF or APNG? I'm tempted to set a limit per frame, since it's easy to hand-roll total size limit if you have a frame size limit, but not vice versa.

@fintelia
Copy link
Contributor

fintelia commented Oct 1, 2020

@birktj how does your design deal with adding additional strict limits later? Wouldn't decoders have to know about the change or else they'd silently ignore the new limits?

@birktj
Copy link
Member Author

birktj commented Oct 1, 2020

Looking over the tiff decoder I see that we also still have the problem of setting limits before metadata is read. There the entire ifd directory is read into memory in order to get metadata. This is problematic because is one of the things we probably would like more fine-grained limits for. (Of course this may simply be a problem with the implementation in the tiff crate that we can fix later).

This was also discussed in #1088 (comment)

@birktj
Copy link
Member Author

birktj commented Oct 1, 2020

@fintelia I realize this would be a slight problem for external decoders. For internal decoders I thought the best would be to destructure the Limits trait to make sure all checks were accounted for, however this is obviously not a possibility for external decoders.

Maybe some sort of ack_limit and remanding_limits could work, however this feels a bit messy?

@HeroicKatora
Copy link
Member

Looking over the tiff decoder I see that we also still have the problem of setting limits before metadata is read.

Instead of requiring limits before reading, could we avoid buffering the whole ifd to memory? Maybe we could skip all tags that are not relevant to the methods required by ImageDecoder and the re-read the image by seeking to the start.

@birktj
Copy link
Member Author

birktj commented Oct 1, 2020

Yeah, I believe we can probably just consider this a quirk in the tiff crate that could be fixed over there at some point.

@birktj
Copy link
Member Author

birktj commented Oct 1, 2020

Actually on closer inspection it doesn't look so bad, it seems the hashmap only contains pointers to the real data and reading of potentially large datatypes like strings is delayed until they are needed. This should mean that setting limits after the decoder is created should be fine.

@Shnatsel
Copy link
Contributor

Shnatsel commented Oct 1, 2020

Unless I can create usize::MAX hashtable entries or some such. Not sure about TIFF, but Vorbis totally lets me do that.

@HeroicKatora
Copy link
Member

HeroicKatora commented Oct 1, 2020

The number of baseline tags is finite and iirc they can only appear once. Maybe we should even use tinyvec+bitset or so to read them there instead of a HashMap in any case.

@fintelia
Copy link
Contributor

fintelia commented Oct 1, 2020

Also TIFF tags are only 16-bits, so even if the image contained every possible tag (most of which aren't even defined...) the hashtable would only take a few megabytes.

@HeroicKatora HeroicKatora added the topic: meta data Towards dealing with meta data label Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion draft Add if this issue includes suggested code, compares interfaces, preparses wording,etc kind: enhancement medium topic: meta data Towards dealing with meta data
Projects
Version 0.23
  
New Features
Development

No branches or pull requests

4 participants