Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Partially labelled dataset #10

Open
Fabio-Arup-Panella opened this issue Apr 9, 2021 · 3 comments
Open

Partially labelled dataset #10

Fabio-Arup-Panella opened this issue Apr 9, 2021 · 3 comments

Comments

@Fabio-Arup-Panella
Copy link

Hi Yen-Cheng,
I am working on a project where, because of some issues, we were able to label only a proportion of the dataset.
Let's say, out of 500 images only 120 were labelled.
Is it possible to use all the 120 as training labelled data and the rest as training unlabelled data?
If so, how do you recommend addressing this?
Below is an example of annotations (of course I can modify it)

{"source-ref":"s3://bucketName/imgName1.png","Dataset_BB":{"annotations":[{"left":2726,"top":675,"width":92,"height":324,"class_id":2},{"left":2352,"top":799,"width":54,"height":193,"class_id":2},{"left":3473,"top":731,"width":68,"height":303,"class_id":2},{"left":3784,"top":869,"width":51,"height":178,"class_id":2},{"left":3900,"top":929,"width":33,"height":121,"class_id":2},{"left":2237,"top":868,"width":35,"height":125,"class_id":2},{"left":2184,"top":902,"width":27,"height":94,"class_id":2},{"left":1965,"top":898,"width":52,"height":12,"class_id":0},{"left":1939,"top":869,"width":66,"height":18,"class_id":0},{"left":1893,"top":823,"width":93,"height":21,"class_id":0},{"left":1790,"top":718,"width":153,"height":35,"class_id":0},{"left":1416,"top":411,"width":304,"height":145,"class_id":0},{"left":268,"top":510,"width":272,"height":112,"class_id":0},{"left":112,"top":798,"width":138,"height":32,"class_id":0},{"left":3637,"top":667,"width":33,"height":36,"class_id":4},{"left":2381,"top":756,"width":15,"height":15,"class_id":4}],"image_size":[{"width":4096,"height":2048,"depth":3}]},"Dataset_BB-metadata":{"job-name":"labeling-job/Dataset_BB","class-map":{"0":"Idler","2":"Pipe_Bracket","4":"Ring_Number"},"human-annotated":"yes","objects":[{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1}],"creation-date":"2020-10-05T20:42:53.052Z","type":"groundtruth/object-detection"}}
{"source-ref":"s3://bucketName/imgName2.png","Dataset_BB":{"annotations":[{"left":1353,"top":366,"width":306,"height":172,"class_id":0},{"left":235,"top":549,"width":263,"height":96,"class_id":0},{"left":103,"top":807,"width":133,"height":32,"class_id":0},{"left":1772,"top":710,"width":166,"height":32,"class_id":0},{"left":1884,"top":817,"width":102,"height":22,"class_id":0},{"left":1963,"top":899,"width":55,"height":10,"class_id":0},{"left":1934,"top":869,"width":71,"height":15,"class_id":0},{"left":3520,"top":745,"width":68,"height":295,"class_id":2},{"left":2783,"top":661,"width":95,"height":348,"class_id":2},{"left":3800,"top":874,"width":49,"height":173,"class_id":2},{"left":3903,"top":930,"width":34,"height":119,"class_id":2},{"left":2370,"top":788,"width":59,"height":205,"class_id":2},{"left":2243,"top":867,"width":36,"height":126,"class_id":2},{"left":2188,"top":900,"width":26,"height":94,"class_id":2},{"left":3666,"top":687,"width":30,"height":31,"class_id":4},{"left":2400,"top":746,"width":15,"height":14,"class_id":4}],"image_size":[{"width":4096,"height":2048,"depth":3}]},"Dataset_BB-metadata":{"job-name":"labeling-job/Dataset_BB","class-map":{"0":"Idler","2":"Pipe_Bracket","4":"Ring_Number"},"human-annotated":"yes","objects":[{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1},{"confidence":1}],"creation-date":"2020-10-05T20:39:16.127Z","type":"groundtruth/object-detection"}}
{"source-ref":"s3://bucketName/imgName3.png"}
{"source-ref":"s3://bucketName/imgName4.png"}
@vlfom
Copy link

vlfom commented Apr 11, 2021

If this helps, I can suggest you to:

  • check the official Detectron2's tutorial on custom datasets (and follow the Colab notebook) so you know how you can add & use your custom dataset
  • set up your dataset for supervised detection first and make sure the training works OK
  • finally, add the unsupervised training component

Regarding the last part, to pick the images for the (un)supervised learning, the authors just randomly split images at the beginning by sampling a list of indices (see divide_label_unlabel here). However, in the current implementation, they actually read pre-generated indices to make results reproducible. You may use exactly the same trick to distinguish between (un)labeled images. E.g. you can order the images in your dataset such that the first 120 are labeled, and the rest 380 are not and reflect it in the seed (or just hardcode it).

For the images that were picked to be used for the "unsupervised part", the authors just delete the labels inside the training loop (see run_step_full_semisup here ).

At this point, I am not sure if you can supply Detectron2 with your 380 images without labels (it may skip them), - if yes, you can just put your images in a format similar to what you mentioned, but if at least 1bbox per image is required, one idea could be to add some random annotations for them, as, anyway, those would be removed inside the training loop.

@sarmientoj24
Copy link

@vlfom

if yes, you can just put your images in a format similar to what you mentioned, but if at least 1bbox per image is required, one idea could be to add some random annotations for them, as, anyway, those would be removed inside the training loop.

Does this mean all images (both labeled and unlabeled) should have annotations with them?

@icrto
Copy link

icrto commented Dec 29, 2021

@sarmientoj24 I think unlabeled images do not need to have annotations with them. You just need to make sure that the filter_empty field is set to False, as is done here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants