Skip to content

Let's find out duplicate images with Perceptual Hashing algorithms

License

Notifications You must be signed in to change notification settings

nikvoronin/we-aint-same

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

We Ain't Same

Let's find out duplicate images with Perceptual Hashing algorithms.

  • Calculate perceptual hashes using ImageHash library.
  • Store and restore precalculated hashes.
  • Recursive seeking of image files.
  • Detect duplicates.
  • Organize images into the groups of duplicates.

Performance

1186 jpeg files in total. Release configuration.

  • 17.2 sec to pre-compute hashes.
  • 0.11 sec to search among pre-hashed images.

Example

samples002

Output log:

+++ Computing hashes...
+ C:\Users\Pictures\Samples2\bing20221129.jpg
+ C:\Users\Pictures\Samples2\fireworks.jpg
+ C:\Users\Pictures\Samples2\mars.jpg
+ C:\Users\Pictures\Samples2\mount-copy.jpg
+ C:\Users\Pictures\Samples2\mount-rotated-2degree.jpg
+ C:\Users\Pictures\Samples2\mount-small.jpg
+ C:\Users\Pictures\Samples2\mountains.jpg

+++ Chasing duplicates...

+++ Similarity: max= 100% / min= 43,75%

+++ Duplicate Groups (1):
Group #0
        mount-copy.jpg
        mount-rotated-2degree.jpg
        mount-small.jpg
        mountains.jpg

+++ TOTAL: 00:00:00.8170144

Precalculated hashes as JSON file:

[
    {
        "Path": "C:\\Users\\Pictures\\Samples2\\bing20221129.jpg",
        "Hash": 11695141823225099355
    },
    {
        "Path": "C:\\Users\\Pictures\\Samples2\\fireworks.jpg",
        "Hash": 10721035060630703339
    },
    // ...
]

Links