Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary cache building for cachingCost #12465

Conversation

AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Apr 20, 2022

Description

CachingCostBalancerStrategy can be inefficient when there are a large number of segments in the load / drop queue.

It builds a cache which takes O(N ^ 2) and computes it N times in the process of loading N segments.

This can be avoided by simply computing and adding the pairwise costs in O(N) computed N times.


Key changed/added classes in this PR
  • CachingCostBalancerStrategy

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to solve this, @AmatyaAvadhanula !

I would suggest breaking up this PR into two parts.

  • First for interned intervals. This would require a perf evaluation of how interning the interval affects both computation time and memory footprint.
  • Second for removing the cache building step. This would require an analysis/proof of how the cost computed by the new method is the same as that computed by the existing method using the cache chain. There should be tests around this proof too. It would also be nice to have some perf evaluation in this PR as there would be a marked decrease in the number of ephemeral objects created and also compute times.

@@ -80,6 +79,12 @@ public final class SegmentId implements Comparable<SegmentId>
*/
private static final Interner<String> STRING_INTERNER = Interners.newWeakInterner();

/**
* Store Intervals since creating them each time before returning is an expensive operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

this.intervalStartMillis = interval.getStartMillis();
this.intervalEndMillis = interval.getEndMillis();
this.intervalChronology = interval.getChronology();
this.interval = INTERVAL_INTERNER.intern(interval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can interval ever be null here?
If not, we can add Objects.requireNonNull similar to the datasource validation in the previous line.

@@ -70,10 +70,19 @@ protected double computeCost(DataSegment proposalSegment, ServerHolder server, b
return cost * (server.getMaxSize() / server.getAvailableSize());
}

private ClusterCostCache costCacheForLoadingSegments(ServerHolder server)
private double costCacheForLoadingSegments(ServerHolder server, DataSegment proposalSegment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Rename to computeCostForLoadingSegmentOnServer

@imply-cheddar
Copy link
Contributor

Doing just the interval interning would make this mergeable, definitely do a separate PR for the caching as the correctness of that is less clear.

We should probably include the flamegraphs that led us to make this code change in this PR.

In terms of memory consumption, the fields being stored on SegmentId are the exact same as what an Interval stores, by interning and reusing the same reference, given that the same interval tends to show up a lot, we should actually save on memory consumption versus increase it while also improving performance.

@AmatyaAvadhanula
Copy link
Contributor Author

Closing since #14484 deprecates cachingCost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants