Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize used segment fetching in Kill tasks #15107

Conversation

AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Oct 7, 2023

#14407 - introduced a change in the behaviour of kill tasks to fetch used specs to prevent used load specs from being killed.

However there could be a significant overhead when this is done for each batch. This PR aims to optimize it.

The preliminary approach taken in this PR is to only fetch those used segments belonging to the intervals corresponding to the unused segments in a given batch.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -225,7 +225,14 @@ public TaskStatus runTask(TaskToolbox toolbox) throws Exception
// Fetch the load specs of all segments overlapping with the given interval
final Set<Map<String, Object>> usedSegmentLoadSpecs = toolbox
.getTaskActionClient()
.submit(new RetrieveUsedSegmentsAction(getDataSource(), getInterval(), null, Segments.INCLUDING_OVERSHADOWED))
.submit(new RetrieveUsedSegmentsAction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format: would be easier to read if the unused intervals were pre-constructed and the constructor was moved to a new line.

Suggested change
.submit(new RetrieveUsedSegmentsAction(
.submit(
new RetrieveUsedSegmentsAction(getDataSource(), null, unusedIntervals, Segments.INCLUDING_OVERSHADOWED)
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review October 8, 2023 05:12
Comment on lines 248 to 249
.filter(unusedSegment -> !usedSegmentLoadSpecs.contains(unusedSegment.getLoadSpec())
|| unusedSegment.getLoadSpec() == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unusedSegment.getLoadSpec() == null should be checked first IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private final int numBatchesProcessed;
private final Integer numSegmentsMarkedAsUnused;

@JsonCreator
public Stats(
@JsonProperty("numSegmentsKilled") int numSegmentsKilled,
@JsonProperty("numSegmentsKilledInDeepStorage") int numSegmentsKilledInDeepStorage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this property too? This is a user-facing property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it to test the changes. I can remove it if it is unnecessary in the report

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove it please. we will add it later if its needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmatyaAvadhanula AmatyaAvadhanula merged commit 40a6dc4 into apache:master Oct 9, 2023
81 checks passed
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
* Optimize used segment fetching in Kill tasks
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
* Optimize used segment fetching in Kill tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants