-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize used segment fetching in Kill tasks #15107
Optimize used segment fetching in Kill tasks #15107
Conversation
@@ -225,7 +225,14 @@ public TaskStatus runTask(TaskToolbox toolbox) throws Exception | |||
// Fetch the load specs of all segments overlapping with the given interval | |||
final Set<Map<String, Object>> usedSegmentLoadSpecs = toolbox | |||
.getTaskActionClient() | |||
.submit(new RetrieveUsedSegmentsAction(getDataSource(), getInterval(), null, Segments.INCLUDING_OVERSHADOWED)) | |||
.submit(new RetrieveUsedSegmentsAction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format: would be easier to read if the unused intervals were pre-constructed and the constructor was moved to a new line.
.submit(new RetrieveUsedSegmentsAction( | |
.submit( | |
new RetrieveUsedSegmentsAction(getDataSource(), null, unusedIntervals, Segments.INCLUDING_OVERSHADOWED) | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.filter(unusedSegment -> !usedSegmentLoadSpecs.contains(unusedSegment.getLoadSpec()) | ||
|| unusedSegment.getLoadSpec() == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unusedSegment.getLoadSpec() == null should be checked first IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
private final int numBatchesProcessed; | ||
private final Integer numSegmentsMarkedAsUnused; | ||
|
||
@JsonCreator | ||
public Stats( | ||
@JsonProperty("numSegmentsKilled") int numSegmentsKilled, | ||
@JsonProperty("numSegmentsKilledInDeepStorage") int numSegmentsKilledInDeepStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add this property too? This is a user-facing property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added it to test the changes. I can remove it if it is unnecessary in the report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets remove it please. we will add it later if its needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* Optimize used segment fetching in Kill tasks
* Optimize used segment fetching in Kill tasks
#14407 - introduced a change in the behaviour of kill tasks to fetch used specs to prevent used load specs from being killed.
However there could be a significant overhead when this is done for each batch. This PR aims to optimize it.
The preliminary approach taken in this PR is to only fetch those used segments belonging to the intervals corresponding to the unused segments in a given batch.
This PR has: