perf: parallelize remapping to improve FTS compaction #2834

BubbleCal · 2024-09-05T04:35:24Z

Tested it on MS MARCO and before this the remap can't be finished in 35m (I didn't wait for finishing because that's too long), and now it's 3.15min

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

codecov-commenter · 2024-09-05T04:56:54Z

Codecov Report

Attention: Patch coverage is 0% with 37 lines in your changes missing coverage. Please review.

Project coverage is 78.04%. Comparing base (d2f636c) to head (2d08d8e).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/builder.rs	0.00%	37 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2834      +/-   ##
==========================================
+ Coverage   77.99%   78.04%   +0.05%     
==========================================
  Files         229      229              
  Lines       70192    70314     +122     
  Branches    70192    70314     +122     
==========================================
+ Hits        54744    54878     +134     
+ Misses      12376    12338      -38     
- Partials     3072     3098      +26

Flag	Coverage Δ
unittests	`78.04% <0.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

westonpace

This seems like a good improvement. Did you setup any kind of benchmark that led you to the conclusion this needed to be sped up? It might be nice to have that benchmark checked in.

westonpace · 2024-09-05T12:18:59Z

rust/lance-index/src/scalar/inverted/builder.rs

+        self.inverted_list.par_iter_mut().for_each_init(
+            || (Vec::new(), Vec::new(), Vec::new()),


I'm not sure there is any particular benefit to using for_each_init (versus for_each and declaring the new variables at the beginning of the closure) here since you are consuming the Vecs at the end of the closure but there is no harm in it either.

Yeah it doesn't benefit from this, did this because I was thinking I would clear the vectors then I can reuse them but finally took it...

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

perf: parallelize remapping to improve FTS compaction

45178c7

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

github-actions bot added the performance label Sep 5, 2024

BubbleCal requested review from eddyxu, westonpace and wjones127 September 5, 2024 05:37

fmt

38e2e08

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal marked this pull request as ready for review September 5, 2024 05:44

westonpace approved these changes Sep 5, 2024

View reviewed changes

read once

2d08d8e

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal merged commit f1ae200 into lancedb:main Sep 9, 2024
23 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize remapping to improve FTS compaction #2834

perf: parallelize remapping to improve FTS compaction #2834

BubbleCal commented Sep 5, 2024 •

edited

Loading

codecov-commenter commented Sep 5, 2024 •

edited

Loading

westonpace left a comment

westonpace Sep 5, 2024

BubbleCal Sep 6, 2024

		self.inverted_list.par_iter_mut().for_each_init(
		\|\| (Vec::new(), Vec::new(), Vec::new()),

perf: parallelize remapping to improve FTS compaction #2834

perf: parallelize remapping to improve FTS compaction #2834

Conversation

BubbleCal commented Sep 5, 2024 • edited Loading

codecov-commenter commented Sep 5, 2024 • edited Loading

Codecov Report

westonpace left a comment

Choose a reason for hiding this comment

westonpace Sep 5, 2024

Choose a reason for hiding this comment

BubbleCal Sep 6, 2024

Choose a reason for hiding this comment

BubbleCal commented Sep 5, 2024 •

edited

Loading

codecov-commenter commented Sep 5, 2024 •

edited

Loading