Skip to content

Evaluation Details

bbischke edited this page May 1, 2017 · 5 revisions

DIRSM Evaluation

Please note: Submissions with incorrect text files (e.g. wrong format or unreadable by our evaluation script) will be not considered.

Evaluation metric. The official metric for evaluating the correctness of retrieved images from Social Media is Average Precision at X (AP@X) at various cutoffs, X={50,100, 200, 300, 400, 500}. The metric measures the number of relevant images among the top X retrieved results and takes the rank into consideration.

FDSI Evaluation

Please note: Submissions with incorrect masks (e.g. different image shape, class labels not in {0,1} or unreadable by our evaluation script) will be not considered.

Evaluation metric. In order to assess performance of generated segmentation masks for flooded areas in the satellite image patches we rely on the Jaccard Index, commonly known as the PASCAL VOC intersection-over-union metric as official evaluation metric:

IoU = TP / (TP + FP + FN), 

where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively, determined over the whole test set. The measure is based on pixels and can be seen as accuracy for pixel-wise classification.

Test-Sets. Both metrics will be evaluated on two different test-sets:

  • Evaluation on unseen patches which are extracted from the same region as in the dev-set
  • Evaluation on unseen patches which are extracted from a new region not being present in the dev-set. (This test-set reveals how well the algorithms generalize on different locations)