Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SuperSorter: direct merging, increased parallelism. #16775

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jul 22, 2024

Two performance enhancements:

  1. Direct merging of input frames to output channels, without any
    temporary files, if all input frames fit in memory.

  2. When doing multi-level merging (now called "external mode"),
    improve parallelism by boosting up the number of mergers in the
    penultimate level.

To support direct merging, FrameChannelMerger is enhanced such that the output partition min/max values are used to filter input frames. This is necessary because all direct mergers read all input frames, but only rows corresponding to a single output partition.

Two performance enhancements:

1) Direct merging of input frames to output channels, without any
   temporary files, if all input frames fit in memory.

2) When doing multi-level merging (now called "external mode"),
   improve parallelism by boosting up the number of mergers in the
   penultimate level.

To support direct merging, FrameChannelMerger is enhanced such that the
output partition min/max values are used to filter input frames. This
is necessary because all direct mergers read all input frames, but only
rows corresponding to a single output partition.
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

} else {
return UNKNOWN_TOTAL;
}
} else if (level > 0 && level == totalMergingLevels - 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i wonder if it might be a bit easier to follow if we moved some of these penultimate and ultimate level checks that do math stuff into named functions to be like level == penultimateLevel(totalMergingLevels) or isPenultimateLevel(level) or something since i see them repeated in places, or i guess could also maybe precompute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isPenultimateLevel might make sense; I think it'd have to return a tri-state since the answer might be unknown. I/we can think about it in a future patch— simplifying the logic of SuperSorter would be great.

@gianm gianm merged commit eaa0993 into apache:master Aug 6, 2024
88 checks passed
@gianm gianm deleted the super-sorter-enhancements branch August 6, 2024 22:00
gianm added a commit to gianm/druid that referenced this pull request Aug 20, 2024
Without the call to readOnly, each output channel retains a 1 MB allocator,
leading to excessive memory use. Fixes regression from apache#16775.
gianm added a commit that referenced this pull request Aug 21, 2024
Without the call to readOnly, each output channel retains a 1 MB allocator,
leading to excessive memory use. Fixes regression from #16775.
hevansDev pushed a commit to hevansDev/druid that referenced this pull request Aug 29, 2024
Without the call to readOnly, each output channel retains a 1 MB allocator,
leading to excessive memory use. Fixes regression from apache#16775.
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants