Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve polling in segment allocation queue #15590

Merged
merged 4 commits into from
Jan 4, 2024

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Dec 18, 2023

Description

When batchAllocationWaitTime is set to 0, the segment allocation queue is polled continuously even when it is empty. This would take up cpu cycles unnecessarily.

Some existing race conditions would also become more frequent when the batchAllocationWaitTime is 0. This PR tries to better address those race conditions as well.

Changes

  • Do not reschedule a poll if queue is empty
  • When a new batch is added to queue, schedule a poll
  • Simplify keyToBatch map
  • Handle race conditions better
  • As soon as a batch starts getting processed, do not add any more requests to it

Alternate approach

It would be much simpler to do a long poll of the queue, but we cannot do that here since we first peek at the batch at head of the queue and pop it only if it is due. There doesn't seem to be an equivalent long peek 🙂.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz
Copy link
Contributor Author

kfaraz commented Dec 19, 2023

Update: Added a few more changes to handle race conditions better and make the code simpler.

@AmatyaAvadhanula
Copy link
Contributor

AmatyaAvadhanula commented Dec 19, 2023

Requests with lineage check cannot be batched. Would it help to not wait before polling for such requests?
Tasks such as MSQ that use lineage check may benefit from it even if the batchAllocationWaitTime is set to a high value.

[Edit]
Just polling immediately wouldn't work as it's a queue. Perhaps we could change it to a priority queue with the poll time as the key

@kfaraz
Copy link
Contributor Author

kfaraz commented Dec 20, 2023

Thanks for the suggestion, @AmatyaAvadhanula . To help me understand better, would a more precise statement instead of Requests with lineage check cannot be batched. be that Requests with lineage check are only sent to the overlord one by one as each subsequent request needs the previous segment id, so there is no scope of batching them.?

I ask this because the SegmentAllocationQueue code itself does not distinguish between lineage check or no lineage check.

Either way, my plan is to eventually get rid of batchAllocationWaitTime altogether, so that no request needs to wait unless some other batch is already being processed. That would benefit all types of requests. The change in this PR is just to stabilize the code so that we can try out batchAllocationWaitTime = 0 before removing the config or updating its default values.

Please let me know what you think.

@AmatyaAvadhanula
Copy link
Contributor

Yes, it would be a more precise statement.
My suggestion was that the code could be changed to distinguish such requests and process them without waiting for batchAllocationWaitTime to elapse.

Thank you for the explanation, @kfaraz. I understand the intent of this change better now.

@AmatyaAvadhanula
Copy link
Contributor

LGTM @kfaraz!

@AmatyaAvadhanula AmatyaAvadhanula merged commit c937068 into apache:master Jan 4, 2024
83 checks passed
@kfaraz kfaraz deleted the fix_segallocq_poll branch January 4, 2024 12:43
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants