Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Error Corrupt JPEG data: 2 extraneous bytes before marker 0xd9 #8908

Closed
1 task done
blackShine-2 opened this issue Aug 9, 2022 · 6 comments
Closed
1 task done
Labels
question Further information is requested Stale

Comments

@blackShine-2
Copy link

Search before asking

Question

I have successfully used YOLOv5 model in my other dataset. However, with this particular dataset, I am getting the following error.
Memory Error Corrupt JPEG data: 2 extraneous bytes before marker 0xd9
Please help me to solve this.
image

Additional

No response

@blackShine-2 blackShine-2 added the question Further information is requested label Aug 9, 2022
@glenn-jocher
Copy link
Member

@blackShine-2 seems like something is wrong with some of your JPEGs. We try to preprocess the dataset and indicate problem images but it seems that your images are causing errors.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@unrue
Copy link

unrue commented Apr 12, 2024

Is it possible to retrieve the image name? I have a similar problem during training:

197/199 51.6G 0.01721 0.02663 0.01128 1141 640: 56%|█████▌ | 170/305 [02:00<01:36, 1.41it/s]Corrupt JPEG data: 2 extraneous bytes before marker 0xd6 197/199 51.6G 0.01725 0.0267 0.01131 1196 640: 63%|██████▎ | 191/305 [02:15<01:21, 1.41it/s]Corrupt JPEG data: 6 extraneous bytes before marker 0xd1 197/199 51.6G 0.01725 0.02669 0.01132 1049 640: 65%|██████▍ | 198/305 [02:20<01:16, 1.41it/s]Corrupt JPEG data: 6 extraneous bytes before marker 0xd1 197/199 51.6G 0.01726 0.02668 0.01133 1082 640: 82%|████████▏ | 249/305 [02:57<00:41, 1.34it/s]Corrupt JPEG data: 2 extraneous bytes before marker 0xd5

But I don't know the images involved in a dataset with 30k images.

@glenn-jocher
Copy link
Member

Hello! 👋 It sounds like you're encountering a common issue where specific training images may be corrupted. Unfortunately, YOLOv5's training logs don't directly output the names of corrupt images during training.

To identify the corrupt images, you might consider running a separate script before training that checks each image's integrity. Here's a quick Python snippet that could help:

import os
from PIL import Image

def check_images(folder_path):
    for root, dirs, files in os.walk(folder_path):
        for file in files:
            if file.endswith('.jpg') or file.endswith('.jpeg'):
                file_path = os.path.join(root, file)
                try:
                    img = Image.open(file_path)  # Open the image file
                    img.verify()  # Verify that it's a valid image
                except (IOError, SyntaxError) as e:
                    print(f'Corrupt image found: {file_path}')

check_images('/path/to/your/dataset')

Replace '/path/to/your/dataset' with the actual path to your dataset. This script will print the paths of corrupt JPEG images, which you can then review or remove from your dataset to prevent these errors during training.

Hope this helps! 🙂

@unrue
Copy link

unrue commented Apr 15, 2024

Hi Glenn,

thanks for the reply. I already did such ckeck, and no images are corrupted. However, during training, I still get the Corrupted error.

@glenn-jocher
Copy link
Member

Hi there!

Thanks for running the checks! If the images appear fine but the error persists, it could be related to a transient issue during the training data loading. A quick workaround could be to catch and handle exceptions within the dataset loading process to skip over problematic images. While not ideal, this can help continue training without interruption. Here’s a snippet that could get you started if you're customizing the data loader:

from PIL import Image
def safe_open(path):
    try:
        img = Image.open(path)
        img.verify()  # Verify the integrity
        img.close()
        img = Image.open(path)  # Open it again as verify() closes the file
        return img
    except (IOError, SyntaxError):
        print(f'Corrupt image skipped: {path}')
        return None  # or a placeholder image of your choice

Use this safe_open function to open images in the dataset loader where images are retrieved. This way, if an image is corrupt, it gets skipped with a warning rather than halting the training.

Remember, this is a workaround. It’s always best to investigate and resolve the root cause of corrupted data if possible. 🛠️

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

3 participants