New balancer strategy: sortingCost #13254
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add a new "sortingCost" Balancer Strategy which is faster than cachingCost while being identical to cost in all cases.
Description
#2972 proposes a cost function for Segment balancing.
While this helps with an optimal distribution of segments across servers, it can be slow on large clusters.
cachingCost Strategy was an attempt to make the same decisions as cost, but faster. However, there are a few discrepancies in the current implementation which lead to uneven distribution and slower convergence in the presence of segments with multiple granularities
This PR introduces
sortingCost
which is a simple optimization of the original cost. It produces the same cost function in the presence of multiple granularities while being just as fast, if not faster, than cachingCostAdd sortingCost strategy
The perf improvemnts can be checked by running
SortingCostComputerTest
#perfComparisonTest
With 100k segments, cost computation is
2000x faster
. However the overall coordinator cycle is unlikely to be affected as drastically.Add simulation
Simulate with different cost strategies using
SegmentLoadingTest
#testLoadAndBalanceSeveral
.Here are the results of the simulation with about 30k segments of hourly, weekly and yearly granularity over 500 iterations.
Segments were loaded for 50 iterations among 3 historicals and then balanced for the rest after adding 2 more historicals.
cachingCost
: 317590 mscost
: 765574 mssortingCost
: 266421 msThis PR has: