Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finer Control Over Which Worker Does What #7463

Merged
merged 14 commits into from
Dec 11, 2023
Merged

Finer Control Over Which Worker Does What #7463

merged 14 commits into from
Dec 11, 2023

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Nov 28, 2023

The Worker postgres table now has new columns to control job scheduling:

maxParallelHighPriorityJobs INT NOT NULL DEFAULT 1,
maxParallelLowPriorityJobs INT NOT NULL DEFAULT 1,
supportedJobCommands VARCHAR(256)[] NOT NULL DEFAULT array[]::varchar(256)[]

This means that in a multi-worker setup, you can select which worker should run which jobs. Also, you can allow low-pri jobs to be scheduled with less parallelity than high-pri jobs.

Steps to test:

TODOs:

  • support parallelity for high and low priority jobs
  • support list of supported job commands per worker
  • add possibility for frontend to query which job commands are available for a datastore
  • sql migration

Follow-Up

Issues:


@fm3 fm3 self-assigned this Nov 28, 2023
@fm3 fm3 marked this pull request as ready for review December 7, 2023 10:56
@fm3 fm3 changed the title wip: finer control over which worker does what Finer Control Over Which Worker Does What Dec 7, 2023
@fm3 fm3 requested a review from frcroth December 7, 2023 11:00
app/models/job/Job.scala Outdated Show resolved Hide resolved
Comment on lines +17 to +18
val highPriorityJobs: Set[Value] = Set(convert_to_wkw, export_tiff)
val lowPriorityJobs: Set[Value] = values.diff(highPriorityJobs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is general and should not be configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d say for now, yes

app/models/job/Worker.scala Outdated Show resolved Hide resolved
@fm3 fm3 requested a review from frcroth December 11, 2023 09:19
@fm3 fm3 merged commit c7fb582 into master Dec 11, 2023
2 checks passed
@fm3 fm3 deleted the finer-worker-control branch December 11, 2023 09:49
@hotzenklotz
Copy link
Member

@fm3 If I see this correctly then the PR only adds the jobsSupportedByAvailableWorkers info to the endpoint /api/datasets/<organization>/<dataset_name>.

What about the /api/features route? Some worker jobs can only be triggered in admin menus, e.g. dataset upload/conversion or finding the max segment ID.

@fm3
Copy link
Member Author

fm3 commented Jan 29, 2024

Good point!

I now pushed a commit on your branch of PR #7591

In there I moved both fields jobsEnabled and jobsSupportedByAvailableWorkers from the dataset json to the dataStore json.

I cannot change the /api/features route content since that is directly from the application.conf, whereas this information is worker (and thus datastore) specific.

Note that the dataset json does contain a full copy of the datastore json (dataset.dataStore) so all points that previously accessed dataset.jobsEnabled can now, as far as I understand, instead access dataset.dataStore.jobsEnabled.

In dataset upload, you can use the info from the selected datastore.

When finding the largestSegmentId, I would have expected that you can use the dataset-specific route, as this concerns a specific dataset.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finer-grained maxparalleljobs
3 participants