Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudo-labeling with Yolov5 (similar to darknet) for Active Learning #404

Closed
marvision-ai opened this issue Jul 14, 2020 · 32 comments
Closed
Labels
enhancement New feature or request

Comments

@marvision-ai
Copy link

🚀 Feature

It would be super effective to be able to process a list of images in a folder and save results of detection in Yolo training format for each image as label <image_name>.txt (in this way you can increase the amount of training data and automate the annotation process)

Motivation + Pitch

I want to be able to do active learning with my models.
As new data comes in, the model can annotate the images, add them into the original database and retrain.

@marvision-ai marvision-ai added the enhancement New feature or request label Jul 14, 2020
@glenn-jocher
Copy link
Member

@marvision-ai to label:
python detect.py --savet-txt

@marvision-ai
Copy link
Author

@glenn-jocher Oh wow. Closing this. Thanks for the information!

@glenn-jocher
Copy link
Member

You're welcome! You'll see this:

Screen Shot 2020-07-14 at 12 58 12 PM

@glenn-jocher
Copy link
Member

@marvision-ai this same pseudolabeling is allowing YOLOv5 to top the leaderboard in the Kaggle wheat competition BTW apparently :)

https://www.kaggle.com/nvnnghia/yolov5-pseudo-labeling

@marvision-ai
Copy link
Author

@glenn-jocher Is there anything they are doing in those kernels that your repo doesn't already do?

@glenn-jocher
Copy link
Member

@marvision-ai I don't know, I haven't been involved at all other than with some licensing issues. The repo supports a few tools out of the box that may be useful in competitions however, such as built in model ensembling and test time augmentation. You can see these in the wiki or the tutorial section of the readme.
https://github.com/ultralytics/yolov5/wiki

@marvision-ai
Copy link
Author

@glenn-jocher Oh okay, makes sense. Thanks for the information and the fantastic repo!

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 14, 2020

@marvision-ai maybe I should add a pseudo-labelling tutorial as well, since it's definitely an interesting item. scale.ai inflated themselves into a billion dollar valuation on the back of it, so offering it here open-source for everyone to use seems like a great idea. How are you using it?

@glenn-jocher
Copy link
Member

If I understand the kaggle wheat approach, the steps were:

  1. Train the best possible model on your train set (i.e. YOLOv5x at a high resolution).
  2. AutoLabel (as I call it) the validation set used for scoring (with TTA enabled I assume).
  3. Fine-tune your trained model on AutoLabeled val set (maybe combined with train set) for 10 epochs.

@marvision-ai
Copy link
Author

@marvision-ai maybe I should add a pseudo-labelling tutorial as well, since it's definitely an interesting item. scale.ai inflated themselves into a billion dollar valuation on the back of it, so offering it here open-source for everyone to use seems like a great idea. How are you using it?

I train large models that have super high accuracy on a small dataset, and then use them to label large datasets to add to my base dataset. (rinse and repeat)
I also then augment them using a wide array of augmentations before training a new network. This process then repeats. And over time this produces models that have far greater generalization.

@marvision-ai
Copy link
Author

If I understand the kaggle wheat approach, the steps were:

1. Train the best possible model on your train set (i.e. YOLOv5x at a high resolution).

2. AutoLabel (as I call it) the validation set used for scoring (with TTA enabled I assume).

3. Fine-tune your trained model on AutoLabeled val set (maybe combined with train set) for 10 epochs.
  1. To train this best model are you just using:
    python train.py --img {biggest resolution} --batch {largest batch} --data {dataset.yaml} --cfg ./models/yolov5x.yaml --weights '' ? Will this use a yolov5x model and train it from scratch?

  2. Yes.

  3. Perhaps you can add into extra augmented images?

@glenn-jocher
Copy link
Member

@marvision-ai yes command looks fine. You can always add more images to the dataset. YOLOv5 augments automatically during training, you can adjust the augmentation hyperparameters in train.py. See the notebook for train*.jpg, which shows an augmented batch.

@marvision-ai
Copy link
Author

marvision-ai commented Jul 14, 2020

@marvision-ai maybe I should add a pseudo-labelling tutorial as well, since it's definitely an interesting item. scale.ai inflated themselves into a billion dollar valuation on the back of it, so offering it here open-source for everyone to use seems like a great idea. How are you using it?

Ultimately, this whole technique of training SOTA networks is a little more advanced but a tutorial for all others would really help them. I think this is what is setting your repo apart from the others and will continue to do so ==> Network training + detection + huge utility functions/algorithms AND TUTORIALS

@marvision-ai
Copy link
Author

@marvision-ai yes command looks fine. You can always add more images to the dataset. YOLOv5 augments automatically during training, you can adjust the augmentation hyperparameters in train.py. See the notebook for train*.jpg, which shows an augmented batch.

Quick questions on this:

  1. For instance, 'degrees': 0.0, # image rotation (+/- deg) . If i set this to 90.0, does it randomly rotate between 0->90?
  2. Does --evolve work with these?
  3. Is there a way I can automate this where: I set up multiple hyperparameter configs --> train a model for each config --> pick best model based on optimal hyperparameters? or is this what --evolve ultimately does?

@glenn-jocher
Copy link
Member

@marvision-ai I would recommend you just change these and observe the effects directly in train*.jpg.

--evolve applies a genetic evolution algorithm (same one we use for AutoAnchor) to the training hyps. A recent PR broke this functionality however, so for now you need to manually tune.

In any case --evolve ideally would use several hundred trainings to arrive at a minima, so it is not something you take on lightly.

@glenn-jocher
Copy link
Member

ultralytics/yolov3#392

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 14, 2020

@marvision-ai jesus, now that I think about it I probably need to create an --evolve tutorial also...

But my #1 piece of advice here is to not get ahead of yourself. Everyone seems to want to overoptimize and second guess everything before they've even started. Before you do anything at all, you need to simply train normally using all default settings, both from scratch and from pretrained weights, and then (only once you have your baseline results in hand) sit down and consider your next steps.

@marvision-ai
Copy link
Author

@glenn-jocher absolutely! I agree. I ask these questions to get a better understanding of workflow. I have a large dataset that I'm training on. I will update you on results. I look forward to experimenting with the hyperparameters and active learning!

@glenn-jocher
Copy link
Member

@marvision-ai great! When you say active learning, this is what you are calling the iterative process (train, pseudolabel, repeat) you described before? I hadn't heard the term before.

@marvision-ai
Copy link
Author

marvision-ai commented Jul 14, 2020

@glenn-jocher correct. It's what I described and also what you described in your bullet points.

When I slowly increase the dataset size with psuedo labeling and go through the annotations I get a feel for what my network is learning or struggling on. Then i can introduce more samples into the main dataset that will help it generalize better. This automates the process of annotation and usually yeilds much better results.

Ive noticed that purely more data != More accuracy. Therefore I'm always actively benchmarking how my model learns and if there is class bias in the works.

You can read a nice summary on this here :
https://jacobgil.github.io/deeplearning/activelearning

There are two main approaches that most of the active learning works follow. Sometimes they are a combination of the two.

Uncertainty sampling:
Try to find images the model isn’t certain about, as a proxy of the model being wrong about the image.
Diversity sampling:
Try to find images that represent the diversity existing in the images that weren’t annotated yet.

A function that gets an image and returns a ranking score, is often called an “acquisition function”.

Adding in this acquisition function into the test.py would be amazing. That would basically allow users to know where the model is failing and on what images to then automate the active learning process.

Sorry for the wall of text. 😅

@glenn-jocher
Copy link
Member

@marvision-ai oh interesting. We already have an image weighting function actually, it weighs images more heavily if they are full of low-mAP objects. It can be used for short term mAP gains during training, but it tends to overtrain faster too unfortunately, resulting in lower final mAP.

yolov5/train.py

Lines 212 to 217 in 611ec44

# Update image weights (optional)
if dataset.image_weights:
w = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights
image_weights = labels_to_image_weights(dataset.labels, nc=nc, class_weights=w)
dataset.indices = random.choices(range(dataset.n), weights=image_weights, k=dataset.n) # rand weighted idx

You can see mAP per class BTW with python test.py --verbose. Wouldn't you simply want to add images from the lowest mAP classes? i.e. go scrape images of cars and bicycles after seeing the results below?

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='data/coco128.yaml', device='', img_size=640, iou_thres=0.65, merge=False, save_json=False, single_cls=False, task='val', verbose=True, weights='yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)

Fusing layers... Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients
Scanning images: 100% 128/128 [00:00<00:00, 3302.66it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 100% 128/128 [00:00<00:00, 19108.45it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100% 4/4 [00:04<00:00,  1.03s/it]
                 all         128         929       0.386       0.752       0.697       0.455
              person         128         254       0.404       0.854       0.787       0.502
             bicycle         128           6        0.44       0.833       0.693       0.302
                 car         128          46       0.342       0.452       0.422       0.218
          motorcycle         128           5       0.428         0.8       0.832       0.634
            airplane         128           6       0.669           1       0.972       0.617
...
Speed: 3.9/1.9/5.8 ms inference/NMS/total per 640x640 image at batch-size 32

@marvision-ai
Copy link
Author

marvision-ai commented Jul 14, 2020

@glenn-jocher yes this is halfway there. From the test I know where I'm failing but it doesn't really tell me the exact instances of why the cars and bikes are getting lower map or how off the detections are unless I go through all the images one by one.

I basically do this all in my head as I'm pseudo labeling/ testing on images to add or remove from the base dataset.

Therefore the process is as follows:

  1. Calculate map for individual classes
  2. Pseudo label more data for particular classes that need help. (Understand where it's failing before retraining.)
  3. Retain and then verify if overall map has increased. (Basically like unit testing for the networks accuracy to make sure introduction of new images didn't reduce accuracy in other classes. )

To summarize: the code is almost there. Perhaps test.py could have the optional functionality to save images that have proven to be the hardest for the network? Either based on incorrect detections or failing to detect vs. ground truth?

I'm just brainstorming here... I just know as a practitioner this would stream line this process immensely since I do this all manually ATM and it can get cumbersome if the dataset is 1000's of images large.

Again, this is not mandatory but this is something that happens in the industry when it comes to productionizing models as you probably know.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 14, 2020

Sure. You can get metrics per image easily from the existing code. If you debug test.py with coco128, stats going into this operation is a list 128 long:

yolov5/test.py

Lines 167 to 175 in 611ec44

stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy
if len(stats):
p, r, ap, f1, ap_class = ap_per_class(*stats)
p, r, ap50, ap = p[:, 0], r[:, 0], ap[:, 0], ap.mean(1) # [P, R, AP@0.5, AP@0.5:0.95]
mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class
else:
nt = torch.zeros(1)

So to get these metrics per image you would just do something like:

for image in stats:
    si = [np.concatenate(x, 0) for x in zip(image)]  # to numpy
    if len(si):
        p, r, ap, f1, ap_class = ap_per_class(*si)
        p, r, ap50, ap = p[:, 0], r[:, 0], ap[:, 0], ap.mean(1)  # [P, R, AP@0.5, AP@0.5:0.95]
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
        nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
    else:
        nt = torch.zeros(1)

@marvision-ai
Copy link
Author

@glenn-jocher Ah interesting... I may fool around with that then and perhaps try to get it working. If I have the time, I may add in some code to have it save the images that were the hardest or do something with them. When I get that working, I can provide a pull request.

@ZeKunZhang1998
Copy link

If I understand the kaggle wheat approach, the steps were:

  1. Train the best possible model on your train set (i.e. YOLOv5x at a high resolution).
  2. AutoLabel (as I call it) the validation set used for scoring (with TTA enabled I assume).
  3. Fine-tune your trained model on AutoLabeled val set (maybe combined with train set) for 10 epochs.

Hi,will it fine-tune on the AutoLabelded or AutoLabelded+trainset?

@glenn-jocher
Copy link
Member

Don't know. You could try it both ways.

@ZeKunZhang1998
Copy link

ZeKunZhang1998 commented Aug 27, 2020 via email

@glenn-jocher
Copy link
Member

@ZeKunZhang1998 for images with no labels, you do not need to supply a txt file.

@ZeKunZhang1998
Copy link

ZeKunZhang1998 commented Aug 27, 2020 via email

@glenn-jocher
Copy link
Member

I don't understand your question.

@ZeKunZhang1998
Copy link

ZeKunZhang1998 commented Aug 27, 2020 via email

@karndeepsingh
Copy link

@marvision-ai Hey Brother, I want to apply Active Learning to my dataset. I can see you have done this already using YOLOV, can please help me with the things to setup and to use YOLOV5 for Active Learning.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants