-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: do flat search if too many rows are filtered out #2583
Conversation
BubbleCal
commented
Jul 11, 2024
•
edited
Loading
edited
- replace RoaringBitmap with BitVec
- use pool to avoid allocating bitset for each query with prefilter
- fall back to flat search if too many rows filtered out
- prefetch with flat search
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
let results = self.search_basic(query.clone(), k, params.ef, bitmap, storage)?; | ||
let keep_count = bitmap.as_ref().map(|b| b.len()).unwrap_or_default() as usize; | ||
let results = if keep_count < self.len() * 80 / 100 { | ||
log::info!("too many rows filtered, using flat search"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be debug
let keep_count = bitmap.as_ref().map(|b| b.len()).unwrap_or_default() as usize; | ||
let results = if keep_count < self.len() * 80 / 100 { | ||
log::info!("too many rows filtered, using flat search"); | ||
let bitmap = bitmap.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: handle error or use expect
let dist_calc = storage.dist_calculator(query); | ||
let mut heap = BinaryHeap::<Reverse<OrderedNode>>::with_capacity(k); | ||
for node_id in node_ids { | ||
let dist = dist_calc.distance(node_id).into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: calculate distance and sort in two stages might be faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will if there are not many rows, we can figure out a good threshold here. will do more benchmark here to get it.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2583 +/- ##
==========================================
- Coverage 80.11% 80.05% -0.06%
==========================================
Files 211 211
Lines 61085 61155 +70
Branches 61085 61155 +70
==========================================
+ Hits 48937 48959 +22
- Misses 9215 9256 +41
- Partials 2933 2940 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
- replace RoaringBitmap with BitVec - use pool to avoid allocating bitset for each query with prefilter - fall back to flat search if too many rows filtered out - prefetch with flat search --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>