Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support list query for explorer #1087

Merged

Conversation

sooahleex
Copy link
Contributor

@sooahleex sooahleex commented Jul 10, 2023

Summary

  • Same as title.
  • Support list query of DatasetItem and string for explorer

How to test

  • I add some unit tests.

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@sooahleex sooahleex added the ENHANCE Enhancement of existing features label Jul 10, 2023
@sooahleex sooahleex marked this pull request as ready for review July 11, 2023 01:41
@sooahleex sooahleex requested review from a team as code owners July 11, 2023 01:41
@sooahleex sooahleex requested review from bonhunko and removed request for a team July 11, 2023 01:41
Copy link
Contributor

@bonhunko bonhunko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment. LGTM!

Comment on lines 77 to 101
if isinstance(query, list):
topk_for_query = int(topk // len(query)) * 2 if not len(query) == 1 else topk
query_hash_key_list = []
result_list = []
for query_ in query:
if isinstance(query_, DatasetItem):
query_key = self._get_hash_key_from_item_query(query_)
query_hash_key_list.append(query_key)
elif isinstance(query_, str):
query_key = self._get_hash_key_from_text_query(query_)
query_hash_key_list.append(query_key)
else:
raise MediaTypeError(
"Unexpected media type of query '%s'. "
"Expected 'DatasetItem' or 'string', actual'%s'" % (query_, type(query_))
)

for query_key in query_hash_key_list:
unpacked_key = np.unpackbits(query_key.hash_key, axis=-1)
logits = calculate_hamming(unpacked_key, database_keys)
ind = np.argsort(logits)

item_list = np.array(self._item_list)[ind]
result_list.extend(item_list[:topk_for_query].tolist())
return np.random.choice(result_list, topk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a non-deterministic process here? Last time you said that the N-to-M top-k similarity search 1) finds the k similarity items for N queries (Nk candidates) and 2) pick up the top-k items from them, is it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the process for this. First make Ntopk or Ntopk_for_query candidates and logits for N queries and sort logits by value. Resort candidates through logit indices and pick up top-k items from them.

@yunchu yunchu added this to the 1.4.0 milestone Jul 12, 2023
Copy link
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sooahleex sooahleex merged commit a47d943 into openvinotoolkit:releases/1.4.0 Jul 13, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ENHANCE Enhancement of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants