Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement top-k optimization #1960

Merged
merged 1 commit into from
Aug 27, 2023
Merged

Implement top-k optimization #1960

merged 1 commit into from
Aug 27, 2023

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Aug 25, 2023

This PR implements the optimization for top-k queries.
Sample top-k queries:
MATCH (comment:Comment) return comment.length,comment.creationDate ORDER BY comment.length, comment.creationDate LIMIT 5

  1. Plan without top-k optimizatioin: We firstly accumulate and order by all tuples, then we perform a limit operation on the result.
  2. Plan with top-k optimization:
    We merge the order by and limit operator together.
    Instead of accumulating all tuples, we do local sort and only keep the top-k tuples in each thread.

Performance number:
https://docs.google.com/spreadsheets/d/1K53Yz8KMuvrFfXbyoPyULhWt9nSBEUI6W4WdMWirODM/edit#gid=1649459535

@codecov
Copy link

codecov bot commented Aug 25, 2023

Codecov Report

Patch coverage: 85.81% and project coverage change: -0.28% ⚠️

Comparison is base (ee029b9) 89.75% compared to head (06b4a2a) 89.48%.

❗ Current head 06b4a2a differs from pull request most recent head 27da653. Consider uploading reports for the commit 27da653 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1960      +/-   ##
==========================================
- Coverage   89.75%   89.48%   -0.28%     
==========================================
  Files         868      881      +13     
  Lines       32032    32348     +316     
==========================================
+ Hits        28750    28946     +196     
- Misses       3282     3402     +120     
Files Changed Coverage Δ
src/include/processor/operator/physical_operator.h 100.00% <ø> (ø)
src/include/processor/result/factorized_table.h 96.55% <ø> (ø)
src/processor/operator/physical_operator.cpp 63.47% <0.00%> (-2.29%) ⬇️
src/processor/processor.cpp 100.00% <ø> (ø)
src/processor/result/factorized_table.cpp 95.68% <ø> (ø)
src/processor/operator/order_by/top_k.cpp 67.07% <67.07%> (ø)
...ocessor/operator/order_by/order_by_key_encoder.cpp 82.82% <84.61%> (-5.12%) ⬇️
...ude/processor/operator/order_by/key_block_merger.h 91.30% <100.00%> (+0.19%) ⬆️
src/include/processor/operator/order_by/order_by.h 100.00% <100.00%> (ø)
...processor/operator/order_by/order_by_key_encoder.h 100.00% <100.00%> (ø)
... and 15 more

... and 57 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@andyfengHKU andyfengHKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix the coverage for this PR before merging

src/include/processor/operator/order_by/sort_state.h Outdated Show resolved Hide resolved
src/include/processor/operator/order_by/key_block_merger.h Outdated Show resolved Hide resolved
src/include/processor/operator/order_by/key_block_merger.h Outdated Show resolved Hide resolved
src/include/processor/operator/order_by/top_k.h Outdated Show resolved Hide resolved
src/processor/operator/order_by/top_k.cpp Show resolved Hide resolved
using vector_select_comparison_func =
std::function<bool(common::ValueVector&, common::ValueVector&, common::SelectionVector&)>;

struct TopKScanState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a TODO for me to move this into mapper

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can move the initialization to mapper.
We need have access to the sortedKeyBlock. If the sorting hasn't started, we don't have access to the sortedKeyBlock.

@acquamarin acquamarin merged commit 8174d25 into master Aug 27, 2023
10 checks passed
@acquamarin acquamarin deleted the top-k branch August 27, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants