Partially labelled dataset #10

Fabio-Arup-Panella · 2021-04-09T14:47:19Z

Hi Yen-Cheng,
I am working on a project where, because of some issues, we were able to label only a proportion of the dataset.
Let's say, out of 500 images only 120 were labelled.
Is it possible to use all the 120 as training labelled data and the rest as training unlabelled data?
If so, how do you recommend addressing this?
Below is an example of annotations (of course I can modify it)

{"source-ref":"s3://bucketName/imgName1.png","Dataset_BB":{"annotations":[{"left":2726,"top":675,"width":92,"height":324,"class_id":2},{"left":2352,"top":799,"width":54,"height":193,"class_id":2},{"left":3473,"top":731,"width":68,"height":303,"class_id":2},{"left":3784,"top":869,"width":51,"height":178,"class_id":2},{"left":3900,"top":929,"width":33,"height":121,"class_id":2},{"left":2237,"top":868,"width":35,"height":125,"class_id":2},{"left":2184,"top":902,"width":27,"height":94,"class_id":2},{"left":1965,"top":898,"width":52,"height":12,"class_id":0},{"left":1939,"top":869,"width":66,"height":18,"class_id":0},{"left":1893,"top":823,"width":93,"height":21,"class_id":0},{"left":1790,"top":718,"width":153,"height":35,"class_id":0},{"left":1416,"top":411,"width":304,"height":145,"class_id":0},{"left":268,"top":510,"width":272,"height":112,"class_id":0},{"left":112,"top":798,"width":138,"height":32,"class_id":0},{"left":3637,"top":667,"width":33,"height":36,"class_id":4},{"left":2381,"top":756,"width":15,"height":15,"class_id":4}],"image_size":[{"width":4096,"height":2048,"depth":3}]},"Dataset_BB-metadata":{"job-name":"labeling-job/Dataset_BB","class-map":{"0":"Idler","2":"Pipe_Bracket","4":"Ring_Number"},"human-annotated":"yes","objects":[{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1}],"creation-date":"2020-10-05T20:42:53.052Z","type":"groundtruth/object-detection"}}
{"source-ref":"s3://bucketName/imgName2.png","Dataset_BB":{"annotations":[{"left":1353,"top":366,"width":306,"height":172,"class_id":0},{"left":235,"top":549,"width":263,"height":96,"class_id":0},{"left":103,"top":807,"width":133,"height":32,"class_id":0},{"left":1772,"top":710,"width":166,"height":32,"class_id":0},{"left":1884,"top":817,"width":102,"height":22,"class_id":0},{"left":1963,"top":899,"width":55,"height":10,"class_id":0},{"left":1934,"top":869,"width":71,"height":15,"class_id":0},{"left":3520,"top":745,"width":68,"height":295,"class_id":2},{"left":2783,"top":661,"width":95,"height":348,"class_id":2},{"left":3800,"top":874,"width":49,"height":173,"class_id":2},{"left":3903,"top":930,"width":34,"height":119,"class_id":2},{"left":2370,"top":788,"width":59,"height":205,"class_id":2},{"left":2243,"top":867,"width":36,"height":126,"class_id":2},{"left":2188,"top":900,"width":26,"height":94,"class_id":2},{"left":3666,"top":687,"width":30,"height":31,"class_id":4},{"left":2400,"top":746,"width":15,"height":14,"class_id":4}],"image_size":[{"width":4096,"height":2048,"depth":3}]},"Dataset_BB-metadata":{"job-name":"labeling-job/Dataset_BB","class-map":{"0":"Idler","2":"Pipe_Bracket","4":"Ring_Number"},"human-annotated":"yes","objects":[{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1}],"creation-date":"2020-10-05T20:39:16.127Z","type":"groundtruth/object-detection"}}
{"source-ref":"s3://bucketName/imgName3.png"}
{"source-ref":"s3://bucketName/imgName4.png"}

The text was updated successfully, but these errors were encountered:

vlfom · 2021-04-11T21:07:50Z

If this helps, I can suggest you to:

check the official Detectron2's tutorial on custom datasets (and follow the Colab notebook) so you know how you can add & use your custom dataset
set up your dataset for supervised detection first and make sure the training works OK
finally, add the unsupervised training component

Regarding the last part, to pick the images for the (un)supervised learning, the authors just randomly split images at the beginning by sampling a list of indices (see divide_label_unlabel here). However, in the current implementation, they actually read pre-generated indices to make results reproducible. You may use exactly the same trick to distinguish between (un)labeled images. E.g. you can order the images in your dataset such that the first 120 are labeled, and the rest 380 are not and reflect it in the seed (or just hardcode it).

For the images that were picked to be used for the "unsupervised part", the authors just delete the labels inside the training loop (see run_step_full_semisup here ).

At this point, I am not sure if you can supply Detectron2 with your 380 images without labels (it may skip them), - if yes, you can just put your images in a format similar to what you mentioned, but if at least 1bbox per image is required, one idea could be to add some random annotations for them, as, anyway, those would be removed inside the training loop.

sarmientoj24 · 2021-09-21T08:44:12Z

@vlfom

if yes, you can just put your images in a format similar to what you mentioned, but if at least 1bbox per image is required, one idea could be to add some random annotations for them, as, anyway, those would be removed inside the training loop.

Does this mean all images (both labeled and unlabeled) should have annotations with them?

icrto · 2021-12-29T18:14:48Z

@sarmientoj24 I think unlabeled images do not need to have annotations with them. You just need to make sure that the filter_empty field is set to False, as is done here.

ycliu93 mentioned this issue May 9, 2021

How to inference on my onw pictures? #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partially labelled dataset #10

Partially labelled dataset #10

Fabio-Arup-Panella commented Apr 9, 2021

vlfom commented Apr 11, 2021 •

edited

Loading

sarmientoj24 commented Sep 21, 2021

icrto commented Dec 29, 2021

Partially labelled dataset #10

Partially labelled dataset #10

Comments

Fabio-Arup-Panella commented Apr 9, 2021

vlfom commented Apr 11, 2021 • edited Loading

sarmientoj24 commented Sep 21, 2021

icrto commented Dec 29, 2021

vlfom commented Apr 11, 2021 •

edited

Loading