Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tag-based concurrency management into clients #14382

Merged
merged 26 commits into from
Jul 27, 2024

Conversation

abrookins
Copy link
Collaborator

@abrookins abrookins commented Jun 27, 2024

Move tag-based concurrency handling client-side, implemented with global concurrency limits. This fixes #14360 and forms part of our larger effort to move all elements of task orchestration client-side.

Limitations and future work:

  • This changes the behavior of task runs waiting for a concurrency slot. Runs transition to Running before they acquire a slot. As future work, we could make runs that use tag-based concurrency transition to a named Running state, such as Running["AcquiringSlot"], and then transition to the normal Running state after acquiring a slot.
  • Task run concurrency limits can report which task runs are using the limits, but global concurrency limits do not report the entity using a limit. In future work, users will be able to see which task runs and flow runs are using global concurrency limits.

Example

This PR changes tag-based task concurrency to use global concurrency limits. When a global concurrency limit exists whose name matches a tag in your task, we will apply that limit to the task when it runs. If you want to create a limit to match a tag on your task, you should create a global concurrency limit, not a task run concurrency limit. Future work will likely consolidate these concepts.

Checklist

  • This pull request includes a label categorizing the change e.g. maintenance, fix, feature, enhancement, docs.
  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

@abrookins abrookins requested review from a team and zangell44 as code owners June 27, 2024 19:02
@abrookins abrookins changed the title POC of supporting tag-based concurrency with global concurrency limits Move tag-based concurrency management into clients Jun 27, 2024
src/prefect/client/orchestration.py Outdated Show resolved Hide resolved
src/prefect/client/orchestration.py Outdated Show resolved Hide resolved
src/prefect/task_engine.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@zangell44 zangell44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple concerns

  1. The concurrency v2 api does not track which object took a given slot. If the client crashes mid-run, we have no way of recovering a slot automatically.
  2. We should not concurrency limit ALL of a task run's tags. Currently we only apply them to tags with limits defined.
  3. 3.x and 2.x task run limits will not be compatible with one another
  4. It seems odd for task run code to be activating concurrency limits. What happens if I want to shut them off?

@zhen0
Copy link
Member

zhen0 commented Jul 8, 2024

@abrookins - doing a bit of maintenance as we have a lot of potentially stale PRs. Is this one that needs action? Or can it be closed?

@abrookins
Copy link
Collaborator Author

Still working on this one! 👍

@abrookins
Copy link
Collaborator Author

@zangell44 Good questions! We may need to expand the concurrency v2 API with more functionality. I'm thinking about this now. I didn't understand question #4.

Copy link

codspeed-hq bot commented Jul 12, 2024

CodSpeed Performance Report

Merging #14382 will not alter performance

Comparing global-concurrency-tags (69e8665) with main (0d23f58)

Summary

✅ 5 untouched benchmarks

@zangell44
Copy link
Collaborator

I think the create_if_missing kwarg + functionality resolves questions 2 and 4.
I do think 1 and 3 are still worthy of consideration.

1.) The concurrency v2 api does not track which object took a given slot. If the client crashes mid-run, we have no way of recovering a slot automatically.
3.) 3.x and 2.x task run limits will not be compatible with one another

3 may not have a solution outside of documenting the behavior.

@github-actions github-actions bot added the bug Something isn't working label Jul 22, 2024
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass review - didn't dig into the user agent stuff versioning logic yet

src/prefect/client/orchestration.py Outdated Show resolved Hide resolved
src/prefect/settings.py Outdated Show resolved Hide resolved
src/prefect/settings.py Outdated Show resolved Hide resolved
@abrookins
Copy link
Collaborator Author

abrookins commented Jul 23, 2024

@zangell44 For 1), I think global concurrency limits should be able to tell you who or what is using them. I plan to add an API endpoint in a follow-up PR that looks at limit acquired and limit released events within a time window to flow or task runs currently using the limit.

For 4), the story should be a little simpler now that client-side concurrency limits will ship with client-side orchestration. That allows us to dump the version-checking code server-side because clients will only be using this new concurrency approach when they use client-side orchestration.

However, the fact remains that if you use client-side orchestration with a task whose tags you had previously created limits for, you would currently need to recreate the limits as global concurrency limits. I haven't spent much time thinking through how to smooth this for users.

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! A few minor nitpicks, otherwise LGTM

src/prefect/concurrency/services.py Outdated Show resolved Hide resolved
src/prefect/concurrency/services.py Outdated Show resolved Hide resolved
src/prefect/concurrency/sync.py Outdated Show resolved Hide resolved
@abrookins abrookins added the enhancement An improvement of an existing feature label Jul 26, 2024
@abrookins abrookins merged commit 717dcff into main Jul 27, 2024
32 of 33 checks passed
@abrookins abrookins deleted the global-concurrency-tags branch July 27, 2024 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x bug Something isn't working enhancement An improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async tasks that use tag-based concurrency limits deadlock when they hit the limit
5 participants