Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jaccard metric prob #3521

Open
handsomeZhuang opened this issue Jun 17, 2024 · 10 comments
Open

jaccard metric prob #3521

handsomeZhuang opened this issue Jun 17, 2024 · 10 comments

Comments

@handsomeZhuang
Copy link

Hi, Dear Development Team,
We have recently used “faiss.index_factory(dim,'Flat', faiss.METRIC_Jaccard)" and index.search() to create index and query, then found the result is not precise. We also found that the implementation of faiss source is different from that of scipy lib,but scipy lib is the same with original Jaccard method. We look forward to your reply!
Best wishes~

@mdouze
Copy link
Contributor

mdouze commented Jun 17, 2024

Thanks for the report, could you give a reproduction example ?

@handsomeZhuang
Copy link
Author

Thanks for the report, could you give a reproduction example ?

yes,for example:
“a = np.array([[70883900,42568368,16938844,55760336, 21177010,83098300,46080616,13810740,63454444,20485222],
[20347602, 27256056, 23762382, 61982300, 37474148, 5487983, 7732985, 15258728, 68216584,16599308],
]).astype(np.float32)
b = np.array([[20635302, 42568368, 16938844,55760336, 65016728, 830983, 46080616, 13810740, 63454444, 2048522.]]).astype(np.float32)
index = faiss.index_factory(a.shape[1],'Flat', faiss.METRIC_Jaccard)
index.train(a)
index.add(a)
dist, id = index.search(b, 1) ‘’
this anwser is a[0,:] instead of a[1,:], but the return is a[1,:] , we look forward to your reply~

@handsomeZhuang
Copy link
Author

Thanks for the report, could you give a reproduction example ?
Hi,
Is the question confirmed?

@handsomeZhuang
Copy link
Author

handsomeZhuang commented Jun 19, 2024 via email

@handsomeZhuang
Copy link
Author

Thanks for the report, could you give a reproduction example ?

hi, dear,
is this prob confirmed? we look forwar to you reply and wait for this lib to finish the future work, we would approciate it if you could confirm this prob! thank you ~

@mdouze
Copy link
Contributor

mdouze commented Jun 24, 2024

Do you have a reference implementation of the Jaccard metric to compare with?

@handsomeZhuang
Copy link
Author

Do you have a reference implementation of the Jaccard metric to compare with?

yes, we use the scipy.distance lib to test it.
max_id = -1
max_score = -1
for i in range(a.shape[0]):
diff = np.bitwise_and((a[i,:] != b), np.bitwise_or(a[i,:] != 0, b != 0)).sum()
temp = b.shape[1] - diff
union = np.double(np.bitwise_or(a[i,:] != 0, b != 0).sum())
score = float(temp / union)
if max_score < score:
max_score = score
max_id = i
print(max_id,max_score)

@handsomeZhuang
Copy link
Author

Do you have a reference implementation of the Jaccard metric to compare with?

hi,please aske whether it has been debuged or not?

@handsomeZhuang
Copy link
Author

Do you have a reference implementation of the Jaccard metric to compare with?

hi,
does the problem have been solved?

@weirdo2310
Copy link

您是否有 Jaccard 度量的参考实现可供比较?

AttributeError: module 'faiss' has no attribute 'METRIC_Jaccard'
Why does this happen? faiss-gpu==1.7.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants