How to interprete annotation file values in Object Detection in Videos task? #22

Haebuk · 2021-10-12T07:41:16Z

I downloaded Task 2 dataset and unzipped it, then i got the annotation files and the format like below:

1,0,593,43,174,190,0,0,0,0
2,0,592,43,174,189,0,0,0,0
3,0,592,43,174,189,0,0,0,0
4,0,592,43,174,189,0,0,0,0
5,0,592,43,174,189,0,0,0,0
...

I found below description,

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

But I think the description is quite different video annotation.
how to interprete this? thank you.

The text was updated successfully, but these errors were encountered:

DiegoLigtenberg · 2021-11-05T11:03:49Z

do you already have an answer? I'm desparetely trying to make this file format work but I just don't undertsand

Haebuk · 2021-11-08T00:11:43Z

@DiegoLigtenberg Not yet :(

Haebuk · 2021-11-08T01:43:57Z

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
	              relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
	              evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
	              others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
	              (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
	              (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
	              and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

RoyCopter · 2022-03-14T09:17:04Z

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
	              relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
	              evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
	              others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
	              (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
	              (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
	              and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Do you have any idea how to convert it to yolov5 annotations?

saadhimmi · 2022-03-18T09:54:19Z

@RoyCopter, you can write a simple script:

For each sequence (each txt file) :
----Load annotation file
----Extract unique frame_id (pd.unique or np.unique)
----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 )
----Read and store image widght w and height h
----For each frame_id:
--------Select only the relevant frame_id lines from the annotation file
--------Divide bbox_center_x and bbox_width columns by w
--------Divide bbox_center_y and bbox_height columns by h
--------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

RoyCopter · 2022-04-14T16:23:32Z

@RoyCopter, you can write a simple script:

For each sequence (each txt file) :
----Load annotation file
----Extract unique frame_id (pd.unique or np.unique)
----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 )
----Read and store image widght w and height h
----For each frame_id:
--------Select only the relevant frame_id lines from the annotation file
--------Divide bbox_center_x and bbox_width columns by w
--------Divide bbox_center_y and bbox_height columns by h
--------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

Thanks!

ganesh0074 · 2023-01-10T11:43:18Z

its available in Visdrone.yaml-

fatbringer · 2023-06-06T02:02:09Z

How should i do it if i want to display the bounding box and also the target's annotation IDs ?

ganesh0074 · 2023-06-06T16:26:30Z

you need convert the given annotations into required format to het BBOX , there is function which converts annotations into correct one

fatbringer · 2023-06-07T01:44:38Z

@GANYESH ooh where might i find the function? I havent been able to find it at all.

Im currently doing it myself by reading the text file line by line and assigning them as such

ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")

It seems that the annotation text files are different for each sub-dataset. How do we get around this?
I am currently working on the visdrone MOT dataset

ganesh0074 · 2023-06-08T10:14:53Z

ad python train.py --data VisDrone.yaml --epochs 300 --weights '' --cfg yolov5n.yaml --batch-size 128 Regards, Ganesh Gulhane Mob. no.-7276158110

…

On Tue, Jun 6, 2023 at 6:44 PM ykn96 ***@***.***> wrote: @GANYESH <https://github.com/Ganyesh> ooh where might i find the function? I havent been able to find it at all. Im currently doing it myself by reading the text file line by line and assigning them as such ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",") It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5EXN5MO2RRM6YWHSTKPNF3XJ7MJFANCNFSM5FZ4IHRQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ganesh0074 · 2023-06-18T18:21:42Z

@fatbringer are you able to get inti it?

fatbringer · 2023-06-19T06:36:49Z

Hi @GANYESH thanks for checking in
Yes i have solved it. turns out the correct sequence is
frame_no, ID, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interprete annotation file values in Object Detection in Videos task? #22

How to interprete annotation file values in Object Detection in Videos task? #22

Haebuk commented Oct 12, 2021

DiegoLigtenberg commented Nov 5, 2021

Haebuk commented Nov 8, 2021

Haebuk commented Nov 8, 2021

RoyCopter commented Mar 14, 2022

saadhimmi commented Mar 18, 2022

RoyCopter commented Apr 14, 2022

ganesh0074 commented Jan 10, 2023

fatbringer commented Jun 6, 2023

ganesh0074 commented Jun 6, 2023

fatbringer commented Jun 7, 2023

ganesh0074 commented Jun 8, 2023 via email

ganesh0074 commented Jun 18, 2023

fatbringer commented Jun 19, 2023

How to interprete annotation file values in Object Detection in Videos task? #22

How to interprete annotation file values in Object Detection in Videos task? #22

Comments

Haebuk commented Oct 12, 2021

DiegoLigtenberg commented Nov 5, 2021

Haebuk commented Nov 8, 2021

Haebuk commented Nov 8, 2021

RoyCopter commented Mar 14, 2022

saadhimmi commented Mar 18, 2022

RoyCopter commented Apr 14, 2022

ganesh0074 commented Jan 10, 2023

fatbringer commented Jun 6, 2023

ganesh0074 commented Jun 6, 2023

fatbringer commented Jun 7, 2023

ganesh0074 commented Jun 8, 2023 via email

ganesh0074 commented Jun 18, 2023

fatbringer commented Jun 19, 2023