Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase scheduler worker cancellation chan cap #741

Conversation

colega
Copy link
Contributor

@colega colega commented Jan 13, 2022

What this PR does:

With previous implementation, if worker was busy talking to scheduler, we didn't push the cancellation, keeping that query running.

When cancelling a query, all its subqueries are cancelled at the same time, so this was most likely happening all the time (first subquery scheduled on this worker was canceled, the rest were not because worker was busy cancelling the first one).

Also removed the <-ctx.Done() escape point when waiting for the enqueueing ACK and modified the enqueueing method to ensure that it always responds something.

Which issue(s) this PR fixes:

Fixes: #740
Inspired by: grafana/loki#5113

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

With previous implementation, if worker was busy talking to scheduler,
we didn't push the cancellation, keeping that query running.

When cancelling a query, all its subqueries are cancelled at the same
time, so this was most likely happening all the time (first subquery
scheduled on this worker was canceled, the rest were not because worker
was busy cancelling the first one).

Also removed the `<-ctx.Done()` escape point when waiting for the
enqueueing ACK and modified the enqueueing method to ensure that it
always responds something.

Fixes: #740
Inspired by: grafana/loki#5113

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
colega added a commit that referenced this pull request Jan 13, 2022
When scheduler worker is handling requests, if there's certain pressure
on the loop, we can keep enqueueing requests instead of handling the
cancellations. However, when there's that pressure on the loop, it's
more important than never to properly cancel the requests.

This checks the cancellations channel first, and then checks the
requests channel too.

Related to #741

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega mentioned this pull request Jan 13, 2022
3 tasks
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

pkg/frontend/v2/frontend.go Show resolved Hide resolved
pkg/frontend/v2/frontend_scheduler_worker.go Outdated Show resolved Hide resolved
pkg/frontend/v2/frontend_test.go Outdated Show resolved Hide resolved
Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>
colega and others added 3 commits January 13, 2022 18:00
Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my feedback!

@pstibrany pstibrany enabled auto-merge (squash) January 14, 2022 08:26
@cyriltovena
Copy link
Contributor

Thanks Oleg ! and well done @kavirajk for finding this issue.

@pstibrany pstibrany merged commit 9e7f5a2 into main Jan 14, 2022
@pstibrany pstibrany deleted the increase-frontend-scheduler-worker-cancellation-channel-capacity branch January 14, 2022 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix request cancellation when frontend worker is busy
3 participants