Load and look up ProjectVIC photoDNA hashes #280

lfcnassif · 2020-10-09T20:36:53Z

Basic support was implemented in #246 without photoDNA loading. Current hashset size would result in about 2GB of heap usage if we load photoDNA hashes on heap. I thought about refactoring photoDNA indexing and lookup to be disk based, but that will need some effort, and probably will be slower. Although loading on heap is not a long term solution, with current hashset size it is possible.

Any thoughts @tc-wleite ?

wladimirleite · 2020-10-09T21:34:17Z

Well, it will definitely be slower (but hopefully still very fast) and will require extra implementation effort.
On the other hand, although 2 GB is not that much, thinking about other hash datasets that can be included in the future, using some disk-based solution seems unavoidable.

Just a couple of quick thoughts, not sure if they make sense at this point:
An option would be to implement a more general solution to deal with hash sets, outside the tasks and parsers(?) implementation, which would allow loading and querying datasets.
The second observation is that, having an option to load in memory could be interesting (when there is a lot of memory available).

lfcnassif · 2020-10-09T22:04:13Z

1.1: Do you mean an external application/service to be queried? I thought about this in the past, I made it possible for tasks to accumulate items to do bulk requests to external services, so network latency will not hurt too much. There is some initial implementation in batchPythonTask branch (bad named);
1.2: The KFFTask needs to be refactored, so it could support different kinds of hashsets, not only NSRL. Not sure if vector based hashes with similarity distances would be easy to put in the same solution;
2: Yes, we currently have a lot of memory, I just thought in a straighforward implementation for now.

wladimirleite · 2020-10-09T22:29:56Z

1.1. That can be another option, but I was thinking about an internal implementation. I meant more in terms of code organization.
1.2. I guess distance would require a specific/more sophisticated solution.

lfcnassif added the enhancement label Oct 9, 2020

lfcnassif self-assigned this Oct 20, 2020

lfcnassif mentioned this issue Oct 21, 2020

Load and look up ProjectVic photoDNA hashes #298

Merged

lfcnassif closed this as completed in 0f31eca Oct 21, 2020

lfcnassif closed this as completed in #298 Oct 21, 2020

lfcnassif changed the title ~~Load and check ProjectVIC photoDNA hashes~~ Load and look up ProjectVIC photoDNA hashes Nov 18, 2020

lfcnassif added a commit that referenced this issue Nov 27, 2020

closes #280: load and look up ProjectVic photoDNA hashes

994c608

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load and look up ProjectVIC photoDNA hashes #280

Load and look up ProjectVIC photoDNA hashes #280

lfcnassif commented Oct 9, 2020

wladimirleite commented Oct 9, 2020

lfcnassif commented Oct 9, 2020 •

edited

Loading

wladimirleite commented Oct 9, 2020

Load and look up ProjectVIC photoDNA hashes #280

Load and look up ProjectVIC photoDNA hashes #280

Comments

lfcnassif commented Oct 9, 2020

wladimirleite commented Oct 9, 2020

lfcnassif commented Oct 9, 2020 • edited Loading

wladimirleite commented Oct 9, 2020

lfcnassif commented Oct 9, 2020 •

edited

Loading