-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning Google Open Images Pretrained YOLO with MSCOCO #1444
Comments
@saitarslanboun sure that sounds reasonable. OI and COCO have many intersecting classes. One issue with OI in general is that the quality of the annotations varies greatly by image. Perhaps more classes were annotated in later versions, because many images lack labels for all classes, like faces for example, which are labelled in some images but not others. This leaves the dataset difficult to implement directly perhaps. |
Thanks for your answer, @glenn-jocher ! Just to make it sure, because it will take probably about a month, do you really believe that I can increase the accuracy of small YOLO from 37 to 44 if I pretrain with open images fully, 300 epochs, and finetune on MSCOCO? |
Oh, no, Im not providing forward numerical projections, I'm simply agreeing larger datasets improve results. |
Use not OpenImages but https://www.objects365.org/overview.html , according to their paper i think you will get some improvements on COCO |
@cszer interesting, thanks! It may make sense to provide pretrained YOLOv5 weights on the Objects365 dataset then for improved finetuning performance on smaller datasets according to their paper. We'd need to export their labels into YOLO format and set up some training runs... |
Yes @cszer, I also plan to do so. Lets see what happens. :) @glenn-jocher, I need to eventually do that as well, but don't know when I will start training. |
@saitarslanboun got it. We'd ideally want to make a objects365.yaml that would autodownload the images and create the labels in the right format just like voc.yaml and coco.yaml. If you have free time and are working with this dataset please consider submitting a PR in the future to help other users :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@saitarslanboun, Hi do you start the training? I'm very interesting about the results if you can share... |
Unfortunately @Silmeria112 , I did not do that task. |
@Silmeria112 Objects365 looks very interesting. 2M images is about about 20X larger than COCO, so this might use about >400 GB of storage, with a single epoch talking about 20X one COCO epoch, though I'd imagine that you could train far fewer epochs than 300 as the dataset is larger. Ideally X amount of time spent training 365 would be more beneficial than the same amount of time spent training COCO. |
@saitarslanboun @Silmeria112 if you guys get started training the 365 dataset please consider submitting a PR with a objects365.yaml and get_objects365.sh script to help everyone else get started easier with the same trainings! |
Hi @glenn-jocher,@saitarslanboun I do want to start train the 365 dataset. However when I checked the dataset, I found that there's quit some crowd bboxes annotation, even for bboxes which I think they're not very crowded. As I understand currently the preprocess of yolo will ignore these crowd bboxes, right? Then there's a lot of missing label objects. I don't have enough GPU resource currently but maybe I can start the training two weeks later. |
@Silmeria112 , here is a chance for you to make contribution for Yolov5, adding crowded boxes training functionality :) |
Or you can label them differently. For example, you would have two different classes for person, and person(crowded). Then the model will learn crowded person and single person objects differently. |
@Silmeria112 yes we've opted to ignore 'iscrowd' boxes in the COCO dataset, so we'd probably want the same behavior in Objects365. That's unfortunate that there's FN's in the dataset labels (missed objects). OIv6 has many missing objects as well. I think maybe the earlier versions of the dataset were not fully labelled with all of the current classes, so you have to be very careful with which parts of the dataset you use, or train a teacher on the well labelled parts to review the not so well labelled parts. |
I'm currently training YoloV5l on Objects365 and got it up to (0.35 mAP_0.5) slightly higher than the one in the paper form Objects365 after about a week of training on 8x A100 |
@ferdinandl007 wow!! A week on an 8x A100 will get you about 500 epochs of YOLOv5x6 on COCO at 1280. How many epochs and at what image size are you training? I would highly recommend running DDP from within our docker container also even if you think you have a good linux environment as it produces the fastest trainings in our experience. It's really easy, you just need to pass in your dataset directory instead of /coco here:
Also lastly, can you submit a PR with your objects365.yaml file to help people get started faster on this dataset in the future? The recent VisDrone PR #2882 is a good example of how to do this, and also if you have a convert script into YOLO format you could place that in data/scripts. |
@ferdinandl007 BTW, one of the reasons I'm asking is we're trying to allow auto-download of the following datasets:
|
@glenn-jocher Sure I can make a PR with the objects365.yaml and hyper parameters I found, I currently got 280 generations done, going to leave it running over the weekend so should be at about 500 then. In terms of auto downloading object 365 that might be quite difficult you have to have a WeChat account to authenticate the download with. In Edition the connection to the the downloads keep failing Took me about a week to download the whole thing �I tried scripting but that didn't really work as the website did not let me download with ‘wget’ basically always failed immediately when I did that so had to use chrome and download it one by one 😅 In terms of the script for conversion I can attach that to the PR when I get time. |
@ferdinandl007 yeah you're right, probably just a objects365.yaml with no If Docker and local environment are the same speed then that means your local environment is very well configured! If Objects365 is like COCO, then you will probably get better results training at larger image sizes with the P6 models, i.e. instead of I try to avoid training at batch-sizes over 128 because then the steps between optimizer updates become quite large and training actually starts to take longer (more epochs). There's a sweet spot somewhere in the --batch-size space maybe around --batch 100, but pushing this to 464 is probably slowing down your training substantially, especially in the early epochs. --sync definitely helps in early training, but I think final mAP may be largely unaffected by --sync, we still need to do a study on this. |
@glenn-jocher |
@ferdinandl007 Greate news that you're getting the result of Objects 365. I think many people may also want to know the transfer/generalization ability of pretrain weights got from Object 365, especially for people who want to train on customize dataset. Do you have any plan to check that, for example comparing performance on VOC of yolo with pretrain on Coco and Object 365? |
@ferdinandl007 well it's important to differentiate between GPU utilization and memory. I think you can still reach high utilization rates (i.e. around 90%) even without saturating the GPU memory. Especially with some of the high end GPUs available today like the 80 GB A100's it won't always make sense to use up 100% of your memory. In regards to speed, the P6 models run at about the same speed as the P5 models. Their main disadvantage is size, they have about 50% more parameters than the P5 models, but all of these extra parameters are in stride-64 convolution layers which are very fast (the slowest convolutions are the P1, P2 layer convolutions that conversely have the fewest parameters). Independently of model type, yes larger images will run inference more slowly as the v5.0 README shows. But one major advantage of training at 1280 is that you can still run inference at lower values, i.e. 320, 640, 960 etc. up to 1280. If you train at 640 you will only get good inference results up to 640 and lower. Also one last note is that P6 models trained at 640 also produce better mAP than P5 models trained at 640. P5 vs P6 timing example is here: # PyTorch Hub
import torch
# Model
model5 = torch.hub.load('ultralytics/yolov5', 'yolov5s')
model6 = torch.hub.load('ultralytics/yolov5', 'yolov5s6')
# Images
imgs = ['zidane.jpg', 'bus.jpg']
for f in imgs: # download 2 images
print(f'Downloading {f}...')
torch.hub.download_url_to_file('https://github.com/ultralytics/yolov5/releases/download/v1.0/' + f, f)
# Inference (batch-size 20)
model5(imgs * 10).print()
model6(imgs * 10).print() |
@glenn-jocher Thank you for this clarification I followed your suggestion and started training at 1280 with with my previously trained yolov5l model and noticed significant mAP increases to 0.42 after one epoch however � processing Time is now about 5-6 hours per epoch at 128 Batch size on x8 A100s probably have to calculate about two weeks for training. @Silmeria112 Right now there are no plans to test transfer learning abilities, but if I get time, I may give it a shot and have a look at performance increases. There should definitely be some based on what I read in the paper of object 365 where they did the same with Rcnn. |
@ferdinandl007 @glenn-jocher Hi I'm planning to do the train soon and I did a small statistical work on the current v2 version of object365. For training set:
So now the dataset is much bigger than that reported in the paper (608K imgs). However there's a lot of "iscrowd" bboxes which is ignored during yolo preprocess. I think there may be a few ways to tackle that (ignore the anchors overlapping "iscrowd" bboxes / replace pixel in the "iscrowd" area with constant value... ), but the simplest way is not using images with "iscrowd" bbox. Then we have 807,538 imgs which is still larger than the number in the paper. A question to @ferdinandl007, what's the label of val set you're using? I only find a submit sample json from the websit, which doesn't seems the ground truth label. |
@ferdinandl007, |
@ferdinandl007 as far as I discovered they included all labels in that single label file the 5 GB Jason. But I was also confused about that too at the beginning it's not very well documented I must say! |
Hi, I would like to share my test on yolov5s. First I train object365 samples without iscrowd bboxes for 50 epoch with defualt setting(hyp.scratch.yaml) and then use the weights as pretain to train on Coco sets for 300 epochs with default setting and 0.1x smaller lr setting. Here are result.
|
@Silmeria112 awesome results, so there definitely was some gain but not significantly, very interesting! |
@Silmeria112 how did you end up tackling your iscrowd problem? Did you replace the pixels with constant value? Or just filter them out? |
@ferdinandl007 filter them out. For the acc on object365, I still can not find out where is the annotations for the val set. So can not measure that. |
@Silmeria112 I think the main annotation file contains all annotations as I have about < 70,000 images missing which is roughly equal to the validation set. When I did the conversion so are use them to create a subset for my validation set after downloading them and putting them all in the same folder structure |
I checked a few images from the val set and still can not find the labels from the big json file (zhiyuan_objv2_train.json). Is this file you're using? |
@ferdinandl007 I think zhiyuan_objv2_train.json contains labels for every image in the train set (the 50 patches), but its a mystery to me where the validation image labels lie. Test set would naturally be missing them but validation set normally comes with labels. I think I'm just going to use our autosplit() function to create a 'YOLOv5 official' val split using 99% and 1% fractions. Lines 1047 to 1054 in 251aeaf
|
@Silmeria112 |
Sorry, I can not access the pretrained weights now. So I can't upload that. |
Could you provide a yolov5.pt on object365(such as yolov5s/yolov5s6/or other on obj365 dataset)?Thank you very much! |
Could you provide any yolov5.pt on obj365(such as yolov5s.pt or yolov5s6.pt)? Thank you very much! |
❔Question
I consider pretraining YOLOv5 small setting with Google Open Images Object Detection dataset https://storage.googleapis.com/openimages/web/download.html. The dataset includes general domain categories with ~15 M box samples. After the pretraining is done, I will fine-tune the model on MSCOCO dataset.
I would like to do it, if I can improve AP by ~7%. Do you think that it is possible, and I have logical expectation? Unfortunately, I could not find anywhere anyone have tried an Open Images pretrained object detector with MSCOCO training.
When I will fine-tune, all the layers will be initiated with the pretrained weights, except the Detect layer, since the number of classes changes.
The text was updated successfully, but these errors were encountered: