Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

Added nstar and nstar_intersection functions as well as tests. #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adewes
Copy link

@adewes adewes commented Apr 1, 2015

I've implemented two functions nstar and nstar_intersection that estimate the number of elements in a Bloom filter as well as in an intersection of two Bloom filters (a separate function is required for the latter case since using nstar() on the intersection directly would yield a wrong result). This is very useful in many circumstances, e.g. when performing similarity searches on hierarchical data structures.

The implementation follows the Wikipedia article:

http://en.wikipedia.org/wiki/Bloom_filter

I've added tests and a short explanation as well, please let me know if you need anything else in order to merge this.

@adewes
Copy link
Author

adewes commented Apr 1, 2015

btw @jaybaird thanks for this project, it's really useful :)

@jaybaird
Copy link
Owner

jaybaird commented Apr 1, 2015

Andreas,

Maybe I'm misunderstanding what this does, but why would you use this over using len? Or in the case of the intersection, doing the intersection and taking the len() there?

I do see now actually that we don't update the count when we union/intersect filters, which is a bug and might better be solved by this. Can you point me to the relevant section in the Wikipedia article you're referencing so I can catch up?

Thanks!

@jaybaird
Copy link
Owner

jaybaird commented Apr 1, 2015

Nevermind, I found it :)

@adewes
Copy link
Author

adewes commented Apr 1, 2015

Hey @jaybaird, the purpose of nstar and nstar_intersection is to provide an estimate of the number of elements in the filter, which is useful if you reconstruct the Bloom filter from the bit array alone (and thus not have access to the count attribute). Also, obtaining a count for an intersection or union of two Bloom filters is not possible using the count attribute, so we need to estimate it.

@adewes
Copy link
Author

adewes commented Apr 8, 2015

@jaybaird is there anything you need from my side to merge this PR or come to a decision on whether to integrate this into the code base?

@jaybaird
Copy link
Owner

I plan on taking a look at this this weekend. I'm still not 100% sure it's utility and there's a larger bug that I realized needs to be fixed but this may be the solution for. I'll keep you posted.

@adewes
Copy link
Author

adewes commented Aug 25, 2015

hey @jaybaird , still planning to merge this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants