Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interprete annotation file values in Object Detection in Videos task? #22

Open
Haebuk opened this issue Oct 12, 2021 · 13 comments
Open

Comments

@Haebuk
Copy link

Haebuk commented Oct 12, 2021

I downloaded Task 2 dataset and unzipped it, then i got the annotation files and the format like below:

1,0,593,43,174,190,0,0,0,0
2,0,592,43,174,189,0,0,0,0
3,0,592,43,174,189,0,0,0,0
4,0,592,43,174,189,0,0,0,0
5,0,592,43,174,189,0,0,0,0
...

I found below description,

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

But I think the description is quite different video annotation.
how to interprete this? thank you.

@DiegoLigtenberg
Copy link

do you already have an answer? I'm desparetely trying to make this file format work but I just don't undertsand

@Haebuk
Copy link
Author

Haebuk commented Nov 8, 2021

@DiegoLigtenberg Not yet :(

@Haebuk
Copy link
Author

Haebuk commented Nov 8, 2021

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
	              relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
	              evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
	              others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
	              (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
	              (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
	              and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

@RoyCopter
Copy link

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
	              relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
	              evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
	              others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
	              (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
	              (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
	              and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Do you have any idea how to convert it to yolov5 annotations?

@saadhimmi
Copy link

@RoyCopter, you can write a simple script:

For each sequence (each txt file) :
----Load annotation file
----Extract unique frame_id (pd.unique or np.unique)
----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 )
----Read and store image widght w and height h
----For each frame_id:
--------Select only the relevant frame_id lines from the annotation file
--------Divide bbox_center_x and bbox_width columns by w
--------Divide bbox_center_y and bbox_height columns by h
--------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

@RoyCopter
Copy link

@RoyCopter, you can write a simple script:

For each sequence (each txt file) :
----Load annotation file
----Extract unique frame_id (pd.unique or np.unique)
----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 )
----Read and store image widght w and height h
----For each frame_id:
--------Select only the relevant frame_id lines from the annotation file
--------Divide bbox_center_x and bbox_width columns by w
--------Divide bbox_center_y and bbox_height columns by h
--------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

Thanks!

@ganesh0074
Copy link

its available in Visdrone.yaml-

@fatbringer
Copy link

How should i do it if i want to display the bounding box and also the target's annotation IDs ?

@ganesh0074
Copy link

you need convert the given annotations into required format to het BBOX , there is function which converts annotations into correct one

@fatbringer
Copy link

@GANYESH ooh where might i find the function? I havent been able to find it at all.

Im currently doing it myself by reading the text file line by line and assigning them as such

ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")

It seems that the annotation text files are different for each sub-dataset. How do we get around this?
I am currently working on the visdrone MOT dataset

@ganesh0074
Copy link

ganesh0074 commented Jun 8, 2023 via email

@ganesh0074
Copy link

@fatbringer are you able to get inti it?

@fatbringer
Copy link

Hi @GANYESH thanks for checking in
Yes i have solved it. turns out the correct sequence is
frame_no, ID, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants