Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning Google Open Images Pretrained YOLO with MSCOCO #1444

Closed
saitarslanboun opened this issue Nov 19, 2020 · 40 comments
Closed

Finetuning Google Open Images Pretrained YOLO with MSCOCO #1444

saitarslanboun opened this issue Nov 19, 2020 · 40 comments
Labels
question Further information is requested Stale

Comments

@saitarslanboun
Copy link

saitarslanboun commented Nov 19, 2020

❔Question

I consider pretraining YOLOv5 small setting with Google Open Images Object Detection dataset https://storage.googleapis.com/openimages/web/download.html. The dataset includes general domain categories with ~15 M box samples. After the pretraining is done, I will fine-tune the model on MSCOCO dataset.

I would like to do it, if I can improve AP by ~7%. Do you think that it is possible, and I have logical expectation? Unfortunately, I could not find anywhere anyone have tried an Open Images pretrained object detector with MSCOCO training.

When I will fine-tune, all the layers will be initiated with the pretrained weights, except the Detect layer, since the number of classes changes.

@saitarslanboun saitarslanboun added the question Further information is requested label Nov 19, 2020
@glenn-jocher
Copy link
Member

@saitarslanboun sure that sounds reasonable. OI and COCO have many intersecting classes. One issue with OI in general is that the quality of the annotations varies greatly by image. Perhaps more classes were annotated in later versions, because many images lack labels for all classes, like faces for example, which are labelled in some images but not others. This leaves the dataset difficult to implement directly perhaps.

@saitarslanboun
Copy link
Author

Thanks for your answer, @glenn-jocher ! Just to make it sure, because it will take probably about a month, do you really believe that I can increase the accuracy of small YOLO from 37 to 44 if I pretrain with open images fully, 300 epochs, and finetune on MSCOCO?

@glenn-jocher
Copy link
Member

Oh, no, Im not providing forward numerical projections, I'm simply agreeing larger datasets improve results.

@cszer
Copy link

cszer commented Nov 22, 2020

Use not OpenImages but https://www.objects365.org/overview.html , according to their paper i think you will get some improvements on COCO

@glenn-jocher
Copy link
Member

@cszer interesting, thanks! It may make sense to provide pretrained YOLOv5 weights on the Objects365 dataset then for improved finetuning performance on smaller datasets according to their paper.

We'd need to export their labels into YOLO format and set up some training runs...

@saitarslanboun
Copy link
Author

Yes @cszer, I also plan to do so. Lets see what happens. :) @glenn-jocher, I need to eventually do that as well, but don't know when I will start training.

@glenn-jocher
Copy link
Member

@saitarslanboun got it. We'd ideally want to make a objects365.yaml that would autodownload the images and create the labels in the right format just like voc.yaml and coco.yaml. If you have free time and are working with this dataset please consider submitting a PR in the future to help other users :)

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@Silmeria112
Copy link

@saitarslanboun, Hi do you start the training? I'm very interesting about the results if you can share...

@saitarslanboun
Copy link
Author

Unfortunately @Silmeria112 , I did not do that task.

@glenn-jocher
Copy link
Member

@Silmeria112 Objects365 looks very interesting. 2M images is about about 20X larger than COCO, so this might use about >400 GB of storage, with a single epoch talking about 20X one COCO epoch, though I'd imagine that you could train far fewer epochs than 300 as the dataset is larger.

Ideally X amount of time spent training 365 would be more beneficial than the same amount of time spent training COCO.

@glenn-jocher
Copy link
Member

@saitarslanboun @Silmeria112 if you guys get started training the 365 dataset please consider submitting a PR with a objects365.yaml and get_objects365.sh script to help everyone else get started easier with the same trainings!

@Silmeria112
Copy link

Hi @glenn-jocher,@saitarslanboun I do want to start train the 365 dataset. However when I checked the dataset, I found that there's quit some crowd bboxes annotation, even for bboxes which I think they're not very crowded. As I understand currently the preprocess of yolo will ignore these crowd bboxes, right? Then there's a lot of missing label objects.

I don't have enough GPU resource currently but maybe I can start the training two weeks later.

@saitarslanboun
Copy link
Author

@Silmeria112 , here is a chance for you to make contribution for Yolov5, adding crowded boxes training functionality :)

@saitarslanboun
Copy link
Author

Or you can label them differently. For example, you would have two different classes for person, and person(crowded). Then the model will learn crowded person and single person objects differently.

@glenn-jocher
Copy link
Member

@Silmeria112 yes we've opted to ignore 'iscrowd' boxes in the COCO dataset, so we'd probably want the same behavior in Objects365. That's unfortunate that there's FN's in the dataset labels (missed objects).

OIv6 has many missing objects as well. I think maybe the earlier versions of the dataset were not fully labelled with all of the current classes, so you have to be very careful with which parts of the dataset you use, or train a teacher on the well labelled parts to review the not so well labelled parts.

@ferdinandl007
Copy link
Contributor

I'm currently training YoloV5l on Objects365 and got it up to (0.35 mAP_0.5) slightly higher than the one in the paper form Objects365 after about a week of training on 8x A100
I'm also using the maximum Batch size the the GPU allows of 58 on 1.
Also running hyper parameter search with a subset of the data set.
Does anyone else have some more tips to improve the accuracy?
Before I start another training run for a week!

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 24, 2021

@ferdinandl007 wow!! A week on an 8x A100 will get you about 500 epochs of YOLOv5x6 on COCO at 1280. How many epochs and at what image size are you training?

I would highly recommend running DDP from within our docker container also even if you think you have a good linux environment as it produces the fastest trainings in our experience. It's really easy, you just need to pass in your dataset directory instead of /coco here:

t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all -v "$(pwd)"/coco:/usr/src/coco $t

Also lastly, can you submit a PR with your objects365.yaml file to help people get started faster on this dataset in the future? The recent VisDrone PR #2882 is a good example of how to do this, and also if you have a convert script into YOLO format you could place that in data/scripts.

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 24, 2021

@ferdinandl007 BTW, one of the reasons I'm asking is we're trying to allow auto-download of the following datasets:

@ferdinandl007
Copy link
Contributor

@glenn-jocher Sure I can make a PR with the objects365.yaml and hyper parameters I found, I currently got 280 generations done, going to leave it running over the weekend so should be at about 500 then.
I tried both docker and bear metal and the results were basically identical currently I'm doing the training on bear metal. Currently takes about an hour and 20 minutes to complete one epochs on the full data set with --img 640 --batch 464 --sync-bn and DDP mode.
Might try training on 24X A100 but from previous testing my data link between the machines is too slow currently to make multi Node feasible for me so I'm using them for hyper parameters search right now.

In terms of auto downloading object 365 that might be quite difficult you have to have a WeChat account to authenticate the download with. In Edition the connection to the the downloads keep failing Took me about a week to download the whole thing �I tried scripting but that didn't really work as the website did not let me download with ‘wget’ basically always failed immediately when I did that so had to use chrome and download it one by one 😅

In terms of the script for conversion I can attach that to the PR when I get time.

@glenn-jocher
Copy link
Member

@ferdinandl007 yeah you're right, probably just a objects365.yaml with no download: field then. I tried to download the dataset earlier but ran into the same issues. I created an account even though I don't speak chinese and managed to get a couple of the zip files downloaded, but gave up due to the complications.

If Docker and local environment are the same speed then that means your local environment is very well configured!

If Objects365 is like COCO, then you will probably get better results training at larger image sizes with the P6 models, i.e. instead of python train.py --img 640 --cfg yolov5l.yaml you'll probably get better results with python train.py --img 1280 --cfg yolov5l6.yaml. You'll have to reduce your batch size by about 4 since the images have 4x as many pixels at 1280, but this is probably a good thing because your batch size may be too large.

I try to avoid training at batch-sizes over 128 because then the steps between optimizer updates become quite large and training actually starts to take longer (more epochs). There's a sweet spot somewhere in the --batch-size space maybe around --batch 100, but pushing this to 464 is probably slowing down your training substantially, especially in the early epochs.

--sync definitely helps in early training, but I think final mAP may be largely unaffected by --sync, we still need to do a study on this.

@ferdinandl007
Copy link
Contributor

@glenn-jocher
Thank you for the tips :) Yeah I did notice that I had significantly faster training when using 124 batch-size But it only utilised about 60% of system resources so I increased batch-size to fully utilises available resources, as my thinking was I would be less likely to get stuck on shallow gradients.
But will try using smaller sizes in the future, also in regards to performance of yolov5l6 is it still capable running in real-time on a iOS devices after pruning and quantisation?
also went training at larger resolutions will inference not take a significant performance penalty?
Or can I continue using the 640 resolution during inference and that worked pretty well for me previously in my applications.

@Silmeria112
Copy link

@ferdinandl007 Greate news that you're getting the result of Objects 365. I think many people may also want to know the transfer/generalization ability of pretrain weights got from Object 365, especially for people who want to train on customize dataset. Do you have any plan to check that, for example comparing performance on VOC of yolo with pretrain on Coco and Object 365?

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 25, 2021

@ferdinandl007 well it's important to differentiate between GPU utilization and memory. I think you can still reach high utilization rates (i.e. around 90%) even without saturating the GPU memory. Especially with some of the high end GPUs available today like the 80 GB A100's it won't always make sense to use up 100% of your memory.

In regards to speed, the P6 models run at about the same speed as the P5 models. Their main disadvantage is size, they have about 50% more parameters than the P5 models, but all of these extra parameters are in stride-64 convolution layers which are very fast (the slowest convolutions are the P1, P2 layer convolutions that conversely have the fewest parameters).

Independently of model type, yes larger images will run inference more slowly as the v5.0 README shows. But one major advantage of training at 1280 is that you can still run inference at lower values, i.e. 320, 640, 960 etc. up to 1280. If you train at 640 you will only get good inference results up to 640 and lower.

Also one last note is that P6 models trained at 640 also produce better mAP than P5 models trained at 640.

P5 vs P6 timing example is here:

# PyTorch Hub
import torch

# Model
model5 = torch.hub.load('ultralytics/yolov5', 'yolov5s')
model6 = torch.hub.load('ultralytics/yolov5', 'yolov5s6')

# Images
imgs = ['zidane.jpg', 'bus.jpg']
for f in imgs:  # download 2 images
    print(f'Downloading {f}...')
    torch.hub.download_url_to_file('https://github.com/ultralytics/yolov5/releases/download/v1.0/' + f, f)

# Inference (batch-size 20)
model5(imgs * 10).print()
model6(imgs * 10).print()

Screenshot 2021-04-25 at 14 37 55

@ferdinandl007
Copy link
Contributor

ferdinandl007 commented Apr 26, 2021

@glenn-jocher Thank you for this clarification I followed your suggestion and started training at 1280 with with my previously trained yolov5l model and noticed significant mAP increases to 0.42 after one epoch however � processing Time is now about 5-6 hours per epoch at 128 Batch size on x8 A100s probably have to calculate about two weeks for training.
Also I will try out the P6 models and report back on performance :)
What hardware do you typically use for training your models?
How well does it scale over an increasing number of GPUs 16+?
Was there a study already?
also does it have support for AMD Rocm GPUs Mi50/Mi100?

@Silmeria112 Right now there are no plans to test transfer learning abilities, but if I get time, I may give it a shot and have a look at performance increases. There should definitely be some based on what I read in the paper of object 365 where they did the same with Rcnn.

@Silmeria112
Copy link

@ferdinandl007 @glenn-jocher Hi I'm planning to do the train soon and I did a small statistical work on the current v2 version of object365. For training set:

  • total imgs: 1,742,292
  • imgs with a least one "iscrowd" bbox: 934,754
  • total bboxes: 25,407,633
  • "iscrowd" bboxes: 2,521,368

So now the dataset is much bigger than that reported in the paper (608K imgs). However there's a lot of "iscrowd" bboxes which is ignored during yolo preprocess. I think there may be a few ways to tackle that (ignore the anchors overlapping "iscrowd" bboxes / replace pixel in the "iscrowd" area with constant value... ), but the simplest way is not using images with "iscrowd" bbox. Then we have 807,538 imgs which is still larger than the number in the paper.

A question to @ferdinandl007, what's the label of val set you're using? I only find a submit sample json from the websit, which doesn't seems the ground truth label.

@marvision-ai
Copy link

@ferdinandl007,
Very interesting! Do you think it would be possible to make those models trained on this dataset publicly available to play around with? Most of us do not have that hardware budget 😉

@ferdinandl007
Copy link
Contributor

@ferdinandl007 as far as I discovered they included all labels in that single label file the 5 GB Jason. But I was also confused about that too at the beginning it's not very well documented I must say!
@marvision-ai In terms of publishing the weights I have to check this with some people in my company as we are using this for some internal research.

@Silmeria112
Copy link

Hi, I would like to share my test on yolov5s. First I train object365 samples without iscrowd bboxes for 50 epoch with defualt setting(hyp.scratch.yaml) and then use the weights as pretain to train on Coco sets for 300 epochs with default setting and 0.1x smaller lr setting. Here are result.

Model lr AP50 mAP
yolov5s default 0.01 55.4 36.7
yolov5s object365->coco 0.01 56.9 36.7
yolov5s object365->coco 0.001 56.9 37.1

@ferdinandl007
Copy link
Contributor

@Silmeria112 awesome results, so there definitely was some gain but not significantly, very interesting!
What was the accuracy you were able to obtain on object365?

@ferdinandl007
Copy link
Contributor

@Silmeria112 how did you end up tackling your iscrowd problem? Did you replace the pixels with constant value? Or just filter them out?

@Silmeria112
Copy link

Silmeria112 commented May 5, 2021

@ferdinandl007 filter them out. For the acc on object365, I still can not find out where is the annotations for the val set. So can not measure that.

@ferdinandl007
Copy link
Contributor

@Silmeria112 I think the main annotation file contains all annotations as I have about < 70,000 images missing which is roughly equal to the validation set. When I did the conversion so are use them to create a subset for my validation set after downloading them and putting them all in the same folder structure

@Silmeria112
Copy link

I checked a few images from the val set and still can not find the labels from the big json file (zhiyuan_objv2_train.json). Is this file you're using?

@glenn-jocher
Copy link
Member

@ferdinandl007 I think zhiyuan_objv2_train.json contains labels for every image in the train set (the 50 patches), but its a mystery to me where the validation image labels lie. Test set would naturally be missing them but validation set normally comes with labels.

I think I'm just going to use our autosplit() function to create a 'YOLOv5 official' val split using 99% and 1% fractions.

yolov5/utils/datasets.py

Lines 1047 to 1054 in 251aeaf

def autosplit(path='../coco128', weights=(0.9, 0.1, 0.0), annotated_only=False):
""" Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files
Usage: from utils.datasets import *; autosplit('../coco128')
Arguments
path: Path to images directory
weights: Train, val, test weights (list)
annotated_only: Only use images with an annotated txt file
"""

@krishnaadithya
Copy link

@Silmeria112
Can you upload the pretrained yolov5s trained on object365 dataset?

@Silmeria112
Copy link

@Silmeria112
Can you upload the pretrained yolov5s trained on object365 dataset?

Sorry, I can not access the pretrained weights now. So I can't upload that.

@wangsun1996
Copy link

I'm currently training YoloV5l on Objects365 and got it up to (0.35 mAP_0.5) slightly higher than the one in the paper form Objects365 after about a week of training on 8x A100 I'm also using the maximum Batch size the the GPU allows of 58 on 1. Also running hyper parameter search with a subset of the data set. Does anyone else have some more tips to improve the accuracy? Before I start another training run for a week!

Could you provide a yolov5.pt on object365(such as yolov5s/yolov5s6/or other on obj365 dataset)?Thank you very much!

@wangsun1996
Copy link

Hi, I would like to share my test on yolov5s. First I train object365 samples without iscrowd bboxes for 50 epoch with defualt setting(hyp.scratch.yaml) and then use the weights as pretain to train on Coco sets for 300 epochs with default setting and 0.1x smaller lr setting. Here are result.
Model lr AP50 mAP
yolov5s default 0.01 55.4 36.7
yolov5s object365->coco 0.01 56.9 36.7
yolov5s object365->coco 0.001 56.9 37.1

Could you provide any yolov5.pt on obj365(such as yolov5s.pt or yolov5s6.pt)? Thank you very much!

@glenn-jocher
Copy link
Member

https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5m_Objects365.pt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

8 participants