Metrics for object detection v2

This a fork of the original Metrics for object detection developped by Rafael Padilla in here so that explains v2.

What this project has to offer?

This work was really helpful and clear but somehow lacked some features I wanted so I decided to expand it with some needed features. More precisely this repository can:

  • Do exactly what the original repository does (at least at the moment of the forking - 26 September 2018)
  • Provide annotation in 2 formats rather in text format only:
    • Text format (txt)
    • Xml format (xml)
  • It can use only members of specific classes instead of using all available classes.
  • The input format is derived by the extension of files in the provided folders.
  • The project offers the options of using a model to detect object in image folder and produce bbox files (xml or txt).
  • (Fixed) The project provides also a 3rd option to get the detection bounding boxes by using a trained object detection model on some random images folder.

All 3 cases of course require the existence of ground truth files (txt or xml).

The ability to read xml derives from the way tensorflow annotated the images in object detection module. So, it seems natural to use xml files which are already annotated in a natural manner.

How to use this project

This project can be used to evaluate the object detection results relatively easy. Currently, there are 3 (overlapping) ways to evaluate a model.

  1. Use text (or xml) files containing the bboxes of both ground truth and detection cases. In this case the proposed method is pascalvoc.

  2. Use text (or xml) files for the ground truth bboxes and use an object detection model to create (predict) bboxes over an image folder. In this case the proposed method would be detect_bboxes followed by pascalvoc. In this case detect_bboxes is used to create the bboxes in a folder and then the process is identical to the previous case.

  3. The final option includes the use of txt (or xml) files for bboxes and application of eval_model. This option is the more discrete leaving no intermediate outputs.

Evaluation using pascalvoc

This case uses already present bboxes in text or xml files.

In order to evaluate the results you need:

  • Either txt files for Ground Truth and Detection
  • Or xml files for Ground Truth and Detection
  • Files for Ground Truth and Detection can be of different format.

Examples of use

The simplest example for the given folders would be:

python3 --accepted-classes person

which will evaluate the xml files in detections-xml over the xml files in groundtruths-xml subfolder.

The above assumes two sub-folders groundtruths-xml and detections-xml containing xml files exist in the same folder of the project.

Text files

(This part is the same as the original code)

These are space delimited text files which contain bounding boxes (bboxes) in either of the two formats in each line, either:

<class_name> <left> <top> <right> <bottom>


<class_name> <left> <top> <width> <height>.

The name of the file should be the same between Ground Truth and Detection and the extension is obligatory to be .txt (otherwise the code won't be able to determine the input format).

The only difference between Ground Truth and Detection files is that there is second extra value in each line which represent the confidence (or score) of each bbox. So, the actual format becomes: either

<class_name> <confidence> <left> <top> <right> <bottom>


<class_name> <left> <top> <width> <height>

Default option is <left> <top> <width> <height>.

Xml files

This basically follows the conventions of Pascal Voc annotation scheme. These xml files can be produces by using for example labelImg on some images and creating manually the annotation.

An xml has roughly this form:


The main parts of the xml file that are used in this process are:

  • filename which correspond to the name of the image
  • object/name which correspond to each bbox class
  • object/bndbox/xmin which corresponds to the bbox left-most coordinate
  • object/bndbox/ymin which corresponds to the bbox top-most coordinate
  • object/bndbox/xmax which corresponds to the bbox right-most coordinate
  • object/bndbox/ymax which corresponds to the bbox bottom-most coordinate

and for the Detection xml files (these has to be created somehow though):

  • object/score which corresponds to the confidence of the detected bbox.

Optional arguments for pascalvoc:

Argument                           Description Example Default
show help message python -h
check version python -v
folder that contains the ground truth bounding boxes files python -g /home/whatever/my_groundtruths/ Object-Detection-Metrics/groundtruths-xml/
folder that contains your detected bounding boxes files python -d /home/whatever/my_detections/ Object-Detection-Metrics/detections-xml/
IOU thershold that tells if a detection is TP or FP python -t 0.75 0.50
--gt-format format of the coordinates of the ground truth bounding boxes * python --gt-format xyrb xyrb
--det-format format of the coordinates of the detected bounding boxes * python --det-format xyrb xyrb
--gt-coords reference of the ground truth bounding bounding box coordinates.
If the annotated coordinates are relative to the image size (as used in YOLO), set it to rel.
If the coordinates are absolute values, not depending to the image size, set it to abs
python --gt-coords rel abs
--det-coords reference of the detected bounding bounding box coordinates.
If the coordinates are relative to the image size (as used in YOLO), set it to rel.
If the coordinates are absolute values, not depending to the image size, set it to abs
python --det-coords rel abs
--img-size image size in the format width,height <int,int>.
Required if --gt-coords or --det-coords is set to rel
python --img-size 600,400
folder where the plots are saved python -s /home/whatever/my_results/ Object-Detection-Metrics/results/
if present no plot is shown during execution python -n not presented.
Therefore, plots are shown
--accepted-classes if present only members belonging to those classes are evaluated (other members are ignored) python --accepted-classes person car empty list.
Meaning all classes are taken into consideration

(*) set -gtformat=xywh and/or -detformat=xywh if format is <left> <top> <width> <height>. Set to -gtformat=xyrb and/or -detformat=xyrb if format is <left> <top> <right> <bottom>.

Applying an object detector to an image folder

The project can use an (already trained) object detector to predict bboxes on an image folder. In order to apply this feature you need:

  • A trained object detection model (the frozen one to be morer specific)
  • A label map which maps objects ids with their respective (human readable) labels

The steps required are roughly:

  • The project can use a (tensorflow) object detection model already trained to produce xml or txt files using: Currently only tensorflow object detector are supported.
  • The output can be either txt or xml files.
  • After the bboxes have been saved to folder can be applied to evaluate the performance of the model.

Examples of use

--accepted-classes person is necessary because the examples given contains ground truth samples of multiple classes but detection was performed on class person only. If run without this argument the per class AP would be correct for person but 0.0 for other classes and mAP would have taken into consideration all classes.

To run detect_bboxes the simpler example would be:

python3 -i image-samples/ -m /path/to/model -l path/to/label_map.pbtxt

which will produce xml files to a subfolder of the image folder.

Optional arguments for detect_bboxes:

Argument                           Description Example Default
show help message python -h
check version python -v
folder containing the images to apply the detection model python -i /home/whatever/my_images/
folder containing the trained model path. This is the folder where the "frozen_inference_graph.pb" resides in other words. python -m /home/whatever/my_model/
the path to the label map for this model. python -l /path/to/label_map
--score-thres the threshold under which the bboxes will be ignored and not written to the output files. Default value is 0.0 python --score-thres=0.2 0.0
--accepted-classes A list with all classes to be taken into consideration when writing the bboxes in files. Default value is an empty list which corresponds to take into consideration all available classes python --accepted-classes person car empty list, which means all samples are treated
Whether output file will be xml or txt. If not set xml is used (default). python --txt_file xml

Direct evaluation of an object detector over an image folder

The 3rd option uses an object detector over an image folder for the evaluation of the same model performance. The difference with the 2nd option is that in this scenario no intermediate files are created and there is just the output of the evaluation. Also this mode takes the more arguments since it combines elements from both previous cases.

As regards performance, the 3 methods use essentially the same tools so besides the automation and file writing overhead gained in the 3rd case no other differences exists.

Examples of use

The simpler use would be:

python3 -g path/to/gt -i path/to/image_folder -m path/to/model -l path/to/label_map

while a potentially more versatile use (applied for only classes person and car) and just print the mAP (without plotting anything):

python3 -g path/to/gt -i path/to/image_folder -m path/to/model -l path/to/label_map -n --accepted-classes person car

Optional arguments for eval_model:

Argument                           Description Example Default
show help message python -h
check version python -v
folder that contains the ground truth bounding boxes files python -g /home/whatever/my_groundtruths/ /Object-Detection-Metrics/groundtruths-xml/
IOU thershold that tells if a detection is TP or FP python -t 0.75 0.50
--gt-format format of the coordinates of the ground truth bounding boxes python --gt-format xyrb xyrb
--gt-coords reference of the ground truth bounding bounding box coordinates.
If the annotated coordinates are relative to the image size (as used in YOLO), set it to rel.
If the coordinates are absolute values, not depending to the image size, set it to abs
python --gt-coords rel abs
--img-size image size in the format width,height <int,int>.
Required if --gt-coords or --det-coords is set to rel
python --img-size 600,400
folder where the plots are saved python -s /home/whatever/my_results/ Object-Detection-Metrics/results/
if present no plot is shown during execution python -n not presented.
Therefore, plots are shown
folder containing the trained model path. This is the folder where the "frozen_inference_graph.pb" resides in other words. python -m /home/whatever/my_model/
the path to the label map for this model. python -l /path/to/label_map
--score-thres the threshold under which the bboxes will be ignored and not written to the output files. Default value is 0.0 python --score-thres=0.2 0.0
--accepted-classes A list with all classes to be taken into consideration when writing the bboxes in files. Default value is an empty list which corresponds to take into consideration all available classes python --accepted-classes person car empty list, which means all samples are treated
--merged-classes A path to a json file containing a dict for merging classes in ground truth bounding boxes\n' python --merged-classes path/to/merged_class.json empty dict, which means no merging occurs