Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parallelize remapping to improve FTS compaction #2834

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Sep 5, 2024

Tested it on MS MARCO and before this the remap can't be finished in 35m (I didn't wait for finishing because that's too long), and now it's 3.15min

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 0% with 37 lines in your changes missing coverage. Please review.

Project coverage is 78.04%. Comparing base (d2f636c) to head (2d08d8e).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/builder.rs 0.00% 37 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2834      +/-   ##
==========================================
+ Coverage   77.99%   78.04%   +0.05%     
==========================================
  Files         229      229              
  Lines       70192    70314     +122     
  Branches    70192    70314     +122     
==========================================
+ Hits        54744    54878     +134     
+ Misses      12376    12338      -38     
- Partials     3072     3098      +26     
Flag Coverage Δ
unittests 78.04% <0.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal marked this pull request as ready for review September 5, 2024 05:44
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good improvement. Did you setup any kind of benchmark that led you to the conclusion this needed to be sped up? It might be nice to have that benchmark checked in.

Comment on lines +395 to +396
self.inverted_list.par_iter_mut().for_each_init(
|| (Vec::new(), Vec::new(), Vec::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there is any particular benefit to using for_each_init (versus for_each and declaring the new variables at the beginning of the closure) here since you are consuming the Vecs at the end of the closure but there is no harm in it either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it doesn't benefit from this, did this because I was thinking I would clear the vectors then I can reuse them but finally took it...

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal merged commit f1ae200 into lancedb:main Sep 9, 2024
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants