Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: split MaterializeIndex stream into batches #2770

Merged
merged 3 commits into from
Sep 4, 2024

Conversation

wjones127
Copy link
Contributor

Fixes #2768

After these changes, we can do the query using <2GB of RAM instead of 33GB! 🚀

Screenshot 2024-08-21 at 3 41 10 PM

@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.94%. Comparing base (2838a87) to head (c9a7a3c).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/scalar_index.rs 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2770      +/-   ##
==========================================
- Coverage   78.47%   77.94%   -0.53%     
==========================================
  Files         228      229       +1     
  Lines       69341    70147     +806     
  Branches    69341    70147     +806     
==========================================
+ Hits        54417    54679     +262     
- Misses      11859    12389     +530     
- Partials     3065     3079      +14     
Flag Coverage Δ
unittests 77.94% <83.33%> (-0.53%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jacketsj
Copy link
Contributor

Wow nice this doesn't just fix my memory issue, it also seems to make my job progress 2.5x faster!

@jacketsj
Copy link
Contributor

Unless there's a chance this might break something else, is there any chance you can un-draft this, and save a better solution for the backlog? It's probably not the best solution, but I've been actively using this branch, since it works for brute-force queries. In addition to fixing filter queries, it also seems to fix unfiltered queries on billion-scale datasets.

@wjones127 wjones127 marked this pull request as ready for review August 29, 2024 16:34
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks. Can you delete the TODO on line 413?

@wjones127 wjones127 merged commit 9c42903 into lancedb:main Sep 4, 2024
19 of 21 checks passed
@wjones127 wjones127 deleted the memory-limited-bf-prefilter branch September 4, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM during prefiltered brute-force vector search query
4 participants