Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow worker to prioritize tasks based on memory production/consumption #5251

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mrocklin
Copy link
Member

This adds a worker.py::TaskPrefix class that tracks consumption and
production of all computed tasks, grouped by task prefix.

Then we use these values when determining priorities,
(de)prioritizing tasks that (produce)/consume five times more data than
they (consume)/produce.

See #5250 for background

This adds a worker.py::TaskPrefix class that tracks consumption and
production of all computed tasks, grouped by task prefix.

Then we use these values when determining priorities,
(de)prioritizing tasks that (produce)/consume five times more data than
they (consume)/produce.
@mrocklin
Copy link
Member Author

OK, I've added pausing, although it's ugly and not very future-reader friendly. This will have to be cleaned up, but it is probably simple enough for people to take a look at if they're interested.

test_resources.py rightfully complained
@jrbourbeau
Copy link
Member

Thanks for pushing this up @mrocklin. I'll plan to read through #5250 and review this tomorrow

@TomNicholas
Copy link

TomNicholas commented May 11, 2022

I'm coming up against workers over-eagerly consuming memory, and wondering what the status of this effort is?

Really I'm just looking for a way to get workers to deprioritise certain memory-consuming root tasks (xr.open_dataset tasks). I can open another issue if that's worthwhile.

@gjoseph92
Copy link
Collaborator

@TomNicholas see #5223 and #5555 for discussion of the underlying problem. We're focused on core stability issues (deadlocks) right now, so changes to the scheduling algorithm to address this are not getting attention at the moment.

Opening a separate issue to discuss workarounds for root task overproduction might be useful. There may be tricks you can play using worker resources, though I haven't had much success with them personally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants