Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: do flat search if too many rows are filtered out #2583

Merged
merged 4 commits into from
Jul 11, 2024

Conversation

BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Jul 11, 2024

  • replace RoaringBitmap with BitVec
  • use pool to avoid allocating bitset for each query with prefilter
  • fall back to flat search if too many rows filtered out
  • prefetch with flat search

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@github-actions github-actions bot added the enhancement New feature or request label Jul 11, 2024
let results = self.search_basic(query.clone(), k, params.ef, bitmap, storage)?;
let keep_count = bitmap.as_ref().map(|b| b.len()).unwrap_or_default() as usize;
let results = if keep_count < self.len() * 80 / 100 {
log::info!("too many rows filtered, using flat search");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be debug

let keep_count = bitmap.as_ref().map(|b| b.len()).unwrap_or_default() as usize;
let results = if keep_count < self.len() * 80 / 100 {
log::info!("too many rows filtered, using flat search");
let bitmap = bitmap.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: handle error or use expect

let dist_calc = storage.dist_calculator(query);
let mut heap = BinaryHeap::<Reverse<OrderedNode>>::with_capacity(k);
for node_id in node_ids {
let dist = dist_calc.distance(node_id).into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: calculate distance and sort in two stages might be faster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will if there are not many rows, we can figure out a good threshold here. will do more benchmark here to get it.

…prefilter

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 27.86885% with 44 lines in your changes missing coverage. Please review.

Project coverage is 80.05%. Comparing base (ebf7c5d) to head (c3473de).
Report is 2 commits behind head on main.

Files Patch % Lines
rust/lance-index/src/vector/hnsw/builder.rs 28.07% 38 Missing and 3 partials ⚠️
rust/lance-index/src/vector/graph.rs 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2583      +/-   ##
==========================================
- Coverage   80.11%   80.05%   -0.06%     
==========================================
  Files         211      211              
  Lines       61085    61155      +70     
  Branches    61085    61155      +70     
==========================================
+ Hits        48937    48959      +22     
- Misses       9215     9256      +41     
- Partials     2933     2940       +7     
Flag Coverage Δ
unittests 80.05% <27.86%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BubbleCal BubbleCal merged commit 9ba6303 into lancedb:main Jul 11, 2024
22 checks passed
eddyxu pushed a commit that referenced this pull request Jul 11, 2024
- replace RoaringBitmap with BitVec
- use pool to avoid allocating bitset for each query with prefilter
- fall back to flat search if too many rows filtered out
- prefetch with flat search

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants