Skip to content

Latest commit

 

History

History
57 lines (36 loc) · 1.94 KB

IMAGE_METRICS.md

File metadata and controls

57 lines (36 loc) · 1.94 KB

Per Image Metrics

We only need to calculate the per-image metrics once when we first run the install script. The installer will save the the results to intermediate files which can then be loaded much faster than having to recalculate the metrics again.

Below we show a list of algorithms we tested to find duplicates.

One option is to compare various per-image metrics using various image "similarity" algorithms:

Unfortunately, no single one of these nor any combination work particularly well across the entire dataset. They produce far too many false positives and false negatives to be useful.

Image Hashes

Use the hashlib python package to calculate md5 checksum. Perceptual image hash functions are available through the contrib add-on package beginning with OpenCV 3.3.0.

Image Entropy

  • Cross Entropy?
  • Shannon Entropy?

grey-level co-occurrence matrix (wiki)

  • energy
  • contrast
  • homogeneity
  • correlation

Available through scikit-image

See also, Harris geospatial

Other useful per-image metrics.

  • Is solid?
  • Ship counts

Overlap Metrics

  • Binary pixel difference
  • Absolute pixel difference

Other Useful Sources

https://en.wikipedia.org/wiki/Relative_change_and_difference

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.455.8550&rep=rep1&type=pdf

http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html