premature end of JPEG images #916

ImsuperSH · 2020-09-05T03:59:45Z

❔Question

Epoch gpu_mem GIoU obj cls total targets img_size 1/99 2.87G 0.05456 0.04197 0 0.09652 10 640: 100% 157/157 [00:52<00:00, 2.98it/s] Class Images Targets P R mAP@.5 mAP@.5:.95: 0% 0/157 [00:00<?, ?it/s]Premature end of JPEG file Class Images Targets P R mAP@.5 mAP@.5:.95: 100% 157/157 [00:19<00:00, 8.21it/s] all 2.5e+03 1e+04 0.362 0.777 0.684 0.338

It shows premature end of JPEG images during validation, what leads to this?

Additional context

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2020-09-05T21:35:32Z

This is caused by a corrupted image.

github-actions · 2020-10-06T00:43:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

seekFire · 2020-11-19T06:54:53Z

@glenn-jocher If the error occurs in the beginning of training and shows "Premature end of JPEG file", Is the error due to the corrupted image?

glenn-jocher · 2020-11-19T11:35:06Z

@seekFire its not an error is a message, its self descriptive.

jaqub-manuel · 2020-12-06T15:17:49Z

Dear @glenn-jocher,
I also have same problem. I wonder how many broken pictures there are. Also more these messages arrived in the new epoch?
These corrupted pictures are probably not something related to annotations, but the images are not exactly copied? When the images were cached, only 1 was written out of order.
Thanks in advance ...

glenn-jocher · 2020-12-06T16:11:19Z

@jaqub-manuel this is a very low level C++ warning in the cv2 image loader I think. It does not produce an error and is not possible to tag these as corrupted in any way that I know currently. Stackoverflow has a few conversations on the topic.

The result is an image will only partially load, the rest of the area will be black. 1 or 2 images with this problem should not harm your dataset.

jaqub-manuel · 2020-12-06T17:12:35Z

@jaqub-manuel this is a very low level C++ warning in the cv2 image loader I think. It does not produce an error and is not possible to tag these as corrupted in any way that I know currently. Stackoverflow has a few conversations on the topic.

The result is an image will only partially load, the rest of the area will be black. 1 or 2 images with this problem should not harm your dataset.

Many Thanks for clarification...

sramakrishnan247 · 2021-03-05T22:24:37Z

@jacklinquan @glenn-jocher
How do you know the number of files that have this issue?
I see something like this on my logs:


Transferred 794/802 items from yolov5x.pt
Optimizer groups: 134 .bias, 142 conv.weight, 131 other
Scanning images: 100%|██████████| 1822/1822 [00:00<00:00, 15125.35it/s]
Scanning labels /home/mli/sramakrishnan/exp6/obj_detector_training/labels.cache (1822 found, 0 missing, 0 empty, 0 duplicate, for 1822 images): 1822it [00:00, 34582.57it/s]
Scanning images: 100%|██████████| 472/472 [00:00<00:00, 15131.13it/s]
Scanning labels /home/mli/sramakrishnan/exp6/obj_detector_training/labels.cache (472 found, 0 missing, 0 empty, 0 duplicate, for 472 images): 472it [00:00, 33152.67it/s]
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

Analyzing anchors... anchors/target = 5.13, Best Possible Recall (BPR) = 0.9999
Image sizes 640 train, 640 test
Using 4 dataloader workers

Does it mean only for these many files?

glenn-jocher · 2021-03-05T22:27:12Z

@sramakrishnan247 it looks like 8 of your files have 'premature end of JPEG'. This is a low level warning, and is not caught by python asserts or cv2 loading errors, so these files will all be used for training.

sramakrishnan247 · 2021-03-05T22:28:22Z

@glenn-jocher
Thanks for letting me know. As long as its 8 files, I assume it is safe to ignore. I have around 2000 samples.

madr3z · 2021-03-06T23:31:41Z

Can anyone please let us know how to find which images have this problem or how can we fix these images?
I tried detecting the problematic images using cv2.imread() but did not find any.

glenn-jocher · 2021-03-08T01:32:32Z

@madr3z this is a low level warning, and is not caught by python asserts or cv2 loading errors, so these files will all be used for training. There is currently no way to identify them, though you could always debug this by printing each image name as it's cached and observing which coincides with the messages.

xiaowk5516 · 2021-06-16T03:58:40Z

it may occur when not downloads the complete image file.
check code:

image_path = ''
if image_path.endswith('jpg'):
    with open(image_path, 'rb') as f: 
        f.seek(-2, 2)
        if f.read() == '\xff\xd9':
            # complete image
        else:
            # Incomplete image

you can try this.

glenn-jocher · 2021-06-16T09:00:06Z

@xiaowk5516 that's an interesting piece of code! We may be able to integrate this into the dataset checks if the speed is fast and it works as intended. The correct location for this would be here:

yolov5/utils/datasets.py

Lines 1054 to 1061 in 65f81bf

    
           # verify images 
        
           im = Image.open(im_file) 
        
           im.verify()  # PIL verify 
        
           shape = exif_size(im)  # image size 
        
           segments = []  # instance segments 
        
           assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels' 
        
           assert im.format.lower() in img_formats, f'invalid image format {im.format}'

glenn-jocher · 2021-06-16T09:28:48Z

@xiaowk5516 I think the following image scanning code should work based on your idea. Can you submit a PR to help integrate this code into master to help everyone with this problem?

        # verify images
        im = Image.open(im_file)
        im.verify()  # PIL verify
        shape = exif_size(im)  # image size
        assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
        assert im.format.lower() in img_formats, f'invalid image format {im.format}'
        if im.format.lower() in ('jpg', 'jpeg'):
            with open(im_file, 'rb') as f:
                f.seek(-2, 2)
                assert f.read() == b'\xff\xd9', 'corrupted JPEG'

xiaowk5516 · 2021-06-16T09:35:48Z

@xiaowk5516 I think the following image scanning code should work based on your idea. Can you submit a PR to help integrate this code into master to help everyone with this problem?

        # verify images
        im = Image.open(im_file)
        im.verify()  # PIL verify
        shape = exif_size(im)  # image size
        assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
        assert im.format.lower() in img_formats, f'invalid image format {im.format}'
        if im.format.lower() in ('jpg', 'jpeg'):
            with open(im_file, 'rb') as f:
                f.seek(-2, 2)
                assert f.read() == b'\xff\xd9', 'corrupted JPEG'

Of course! I will submit it soon.

glenn-jocher · 2021-06-16T09:59:17Z

@xiaowk5516 great!

glenn-jocher · 2021-06-16T11:34:06Z

@ImsuperSH @seekFire @sramakrishnan247 @jacklinquan @madr3z good news 😃! Your original issue may now be fixed ✅ in PR #3638. This PR adds JPEG corruption error checking by @xiaowk5516 to the YOLOv5 train and testloaders. To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

Poulinakis-Konstantinos · 2021-08-25T07:17:24Z

Hello, this might be a little late but I found a solution to fixing premature ending error. I leave this here in case anyone needs it in the future.

In short, using opencv to read the image and then save it using opencv will fix the image and add the EOI code 'D9' in the end of the hex file.
https://github.com/Poulinakis-Konstantinos/ML-util-functions/blob/master/scripts/Img_Premature_Ending-Detect_Fix.py

glenn-jocher · 2021-08-25T22:32:29Z

@Poulinakis-Konstantinos thanks for the idea! Do you know if PIL Image saving also resolves the issue?

The reason I ask is the images are already opened with PIL as im when the corruption scanning is performed:

yolov5/utils/datasets.py

Lines 866 to 876 in 2da6444

    
           # verify images 
        
           im = Image.open(im_file) 
        
           im.verify()  # PIL verify 
        
           shape = exif_size(im)  # image size 
        
           assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels' 
        
           assert im.format.lower() in IMG_FORMATS, f'invalid image format {im.format}' 
        
           if im.format.lower() in ('jpg', 'jpeg'): 
        
               with open(im_file, 'rb') as f: 
        
                   f.seek(-2, 2) 
        
                   assert f.read() == b'\xff\xd9', 'corrupted JPEG'

Poulinakis-Konstantinos · 2021-08-26T07:28:45Z

@glenn-jocher I just tested it with PIL. Yes, saving the image with PIL does restore the image's EOI mark !

Adding a save command in case a corrupted image is detected would probably be beneficial .

glenn-jocher · 2021-08-26T10:46:06Z

@Poulinakis-Konstantinos hmm interesting. Ok, we need to be very careful about saving the images as PIL includes a default compression level, cv2 I'm not sure, but we want to make sure the new JPG pixel values are not altered in any way.

If we can get some corrupted images to pass an np.allclose() test before and after I think that should suffice. Do you have any corrupted JPEGs you could share?

glenn-jocher · 2021-08-26T10:52:32Z

Maybe something like this:

im.save(im_file, format='JPEG', subsampling=0, quality=100)

From https://stackoverflow.com/questions/19303621/why-is-the-quality-of-jpeg-images-produced-by-pil-so-poor

glenn-jocher · 2021-08-26T11:24:47Z

@Poulinakis-Konstantinos I can't figure out how to save a JPG without altering it. I created a script here that shows significant differences in pixel values on both cv2 and PIL saving. Do you have any ideas?

import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

fp = '000000000034.jpg'  # original image file path
fp_pil = fp + '.PIL.jpg'
fp_cv2 = fp + '.cv2.jpg'

# Read and write cv2 and PIL JPGs
Image.open(fp).save(fp_pil)
cv2.imwrite(fp_cv2, cv2.imread(fp))

# Read new JPGs and compare
im = cv2.imread(fp)
im_pil = cv2.imread(fp_pil)
im_cv2 = cv2.imread(fp_cv2)
dp = (im - im_pil).ravel()
dc = (im - im_cv2).ravel()
print(np.allclose(im, im_pil))
print(np.allclose(im, im_cv2))

# Plot
fig, ax = plt.subplots(1, 2, figsize=(8, 4), tight_layout=True)
ax[0].hist(dp, 255)
ax[1].hist(dc, 255)
plt.savefig('results.jpg')

Related: https://stackoverflow.com/questions/54610705/copied-image-saved-with-different-pixels-to-original-with-pil

glenn-jocher · 2021-08-26T12:13:13Z

@Poulinakis-Konstantinos I've opened a PR with a fix in #4548. Can you review please?

xiaowk5516 · 2021-08-26T12:46:24Z

@glenn-jocher jpg and jpeg is lossy compression for digital images. that is, its compression is irreversible, and the pixel value of the image obtained by decompression and recompression will be different.

glenn-jocher · 2021-08-26T13:53:52Z

@ImsuperSH @Poulinakis-Konstantinos @seekFire @jaqub-manuel @xiaowk5516 good news 😃! Your original issue may now be fixed ✅ in PR #4548. This PR automatically restores and saves corrupted JPEGs before training starts, and all images are now used for training, including the restored JPEGs.

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

ImsuperSH added the question Further information is requested label Sep 5, 2020

github-actions bot added the Stale label Oct 6, 2020

github-actions bot closed this as completed Oct 11, 2020

glenn-jocher added the TODO label Jun 16, 2021

glenn-jocher linked a pull request Jun 16, 2021 that will close this issue

Assert non-premature end of JPEG images #3638

Merged

glenn-jocher removed the TODO label Jun 16, 2021

glenn-jocher reopened this Aug 26, 2021

glenn-jocher linked a pull request Aug 26, 2021 that will close this issue

Auto-fix corrupt JPEGs #4548

Merged

glenn-jocher closed this as completed in #4548 Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

premature end of JPEG images #916

premature end of JPEG images #916

ImsuperSH commented Sep 5, 2020

glenn-jocher commented Sep 5, 2020

github-actions bot commented Oct 6, 2020

seekFire commented Nov 19, 2020 •

edited

Loading

glenn-jocher commented Nov 19, 2020

jaqub-manuel commented Dec 6, 2020

glenn-jocher commented Dec 6, 2020

jaqub-manuel commented Dec 6, 2020

sramakrishnan247 commented Mar 5, 2021 •

edited

Loading

glenn-jocher commented Mar 5, 2021

sramakrishnan247 commented Mar 5, 2021 •

edited

Loading

madr3z commented Mar 6, 2021

glenn-jocher commented Mar 8, 2021

xiaowk5516 commented Jun 16, 2021 •

edited

Loading

glenn-jocher commented Jun 16, 2021

glenn-jocher commented Jun 16, 2021

xiaowk5516 commented Jun 16, 2021

glenn-jocher commented Jun 16, 2021

glenn-jocher commented Jun 16, 2021

Poulinakis-Konstantinos commented Aug 25, 2021

glenn-jocher commented Aug 25, 2021 •

edited

Loading

Poulinakis-Konstantinos commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021 •

edited

Loading

glenn-jocher commented Aug 26, 2021

xiaowk5516 commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021

premature end of JPEG images #916

premature end of JPEG images #916

Comments

ImsuperSH commented Sep 5, 2020

❔Question

Additional context

glenn-jocher commented Sep 5, 2020

github-actions bot commented Oct 6, 2020

seekFire commented Nov 19, 2020 • edited Loading

glenn-jocher commented Nov 19, 2020

jaqub-manuel commented Dec 6, 2020

glenn-jocher commented Dec 6, 2020

jaqub-manuel commented Dec 6, 2020

sramakrishnan247 commented Mar 5, 2021 • edited Loading

glenn-jocher commented Mar 5, 2021

sramakrishnan247 commented Mar 5, 2021 • edited Loading

madr3z commented Mar 6, 2021

glenn-jocher commented Mar 8, 2021

xiaowk5516 commented Jun 16, 2021 • edited Loading

glenn-jocher commented Jun 16, 2021

glenn-jocher commented Jun 16, 2021

xiaowk5516 commented Jun 16, 2021

glenn-jocher commented Jun 16, 2021

glenn-jocher commented Jun 16, 2021

Poulinakis-Konstantinos commented Aug 25, 2021

glenn-jocher commented Aug 25, 2021 • edited Loading

Poulinakis-Konstantinos commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021 • edited Loading

glenn-jocher commented Aug 26, 2021

xiaowk5516 commented Aug 26, 2021

glenn-jocher commented Aug 26, 2021

seekFire commented Nov 19, 2020 •

edited

Loading

sramakrishnan247 commented Mar 5, 2021 •

edited

Loading

sramakrishnan247 commented Mar 5, 2021 •

edited

Loading

xiaowk5516 commented Jun 16, 2021 •

edited

Loading

glenn-jocher commented Aug 25, 2021 •

edited

Loading

glenn-jocher commented Aug 26, 2021 •

edited

Loading