Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.x] Add autoScalingStrategy option #1254

Merged

Conversation

cosmastech
Copy link
Contributor

@cosmastech cosmastech commented Mar 4, 2023

This PR adds an option to configure how a supervisor auto-scales queues. It allows for runtime or size, set on the supervisor level under the autoScalingBehavior key.

runtime is the current behavior (portion the number of available processes based on time to complete a queue / total time to complete all queues), where as size portions processes based on total number of jobs in queue / total number of jobs in all supervised queues.

For some context:

We have a Horizon supervisor set up like this:

'supervisor-medium' => [
    'connection' => 'redis_medium',
    'queue' => [
        'ai',
        'notifications',
        'import',
        'other-queue',
    ],
    'balance' => 'auto',
    'minProcesses' => 1,
    'maxProcesses' => 10,
    'memory' => 64,
    'tries' => 3,
    'timeout' => 300,
],

How should the queue supervisor balance when there are 2700 jobs in the ai queue and 5000 jobs in the import queue?

We are seeing that, though the supervisor spins up more processes, they are all given to the ai queue and not allocated to the import queue. So, we would see that ai queue processes went from 1 to 6 processes, but import stayed at 1.

To solve this problem, we extended the AutoScaler class to add this behavior. I thought it might be a nice feature for Horizon to have.

Also, I think it's worth noting in the Horizon docs about the time-to-complete behavior. As they currently read:

The auto strategy adjusts the number of worker processes per queue based on the current workload of the queue. For example, if your notifications queue has 1,000 pending jobs while your render queue is empty, Horizon will allocate more workers to your notifications queue until the queue is empty.

I think this is easy to misunderstand as it reads like it's balanced on job count, not time to complete. It required digging into the AutoScaler code to understand the behavior we were seeing.

Additionally, I think it might be nice to:

  1. allow for configuring a custom strategy per supervisor, and/or
  2. make the process allocation based on a combination of size + runtime

For the second point, I couldn't think of a good way to do that. I was thinking maybe that jobs should be a factor in the calculation, weighted lower than the time to complete. But I didn't spend much time trying things out.

@cosmastech cosmastech force-pushed the feature/auto-scaling-method-option branch from de7529a to ceaaa58 Compare March 4, 2023 12:08
@taylorotwell
Copy link
Member

Do we really need a new balancing strategy? Could you just define a second supervisor configuration for the other queues you want to ensure have more processes?

@taylorotwell
Copy link
Member

You also state you think it would be nice to allow a custom strategy per supervisor... but isn't that what you have done in this PR? Am I missing something there?

Also, can you share an example of a configuration using you new option?

@cosmastech
Copy link
Contributor Author

cosmastech commented Mar 5, 2023

Do we really need a new balancing strategy? Could you just define a second supervisor configuration for the other queues you want to ensure have more processes?

This was an issue we noticed on our production app.

We our running our Laravel monolith on K8s. We scale our worker pods automatically. We have 12 different queues handled by Horizon, running 3 separate supervisors.

Using the current Horizon strategy, the ai queue with 2700 jobs (but a longer time to complete each job) starves the import queue that has 5000 jobs (but a much lower time to complete), putting out of mind how it affects the other queues shared by the supervisor.

I do think that time to clear should play a factor in the balancing strategy, but it shouldn’t impact it this significantly. With a supervisor responsible for 6 queues & having maxProcesses of 24, it isn't desirable to allocate 19 processes to one queue, and 1 for each of the rest.

Ideally I think it would be best if the process allocation was based on a combination of time to clear + size of queue, but I'm not sure what that would look like.

Because of the current Horizon auto-balance strategy, we see an increase in customer support tickets. It feels like we would be playing whack-a-mole to guess which combinations of queues should share a supervisor.

Is it envisioned that production applications have a supervisor for each queue?

You also state you think it would be nice to allow a custom strategy per supervisor... but isn't that what you have done in this PR? Am I missing something there?

I could've added more clarity to that one.

Yes, that is sort of what I have done with this PR, but that's still only 2 strategies. Was thinking it might be beneficial for Horizon to allow a consumer application to modify or add their own auto-balancing strategy. (I'm not entirely should what that would look like, but just figured I would include it as a thought that I had).

Also, can you share an example of a configuration using you new option?

Here is an example of the configuration so it relies on the new strategy included in the PR:

'supervisor-medium' => [
    'connection' => 'redis_medium',
    'queue' => [
        'ai',
        'notifications',
        'import',
        'other-queue',
    ],
    'balance' => 'auto',
    'minProcesses' => 1,
    'maxProcesses' => 10,
    'memory' => 64,
    'tries' => 3,
    'timeout' => 300,
    'autoScalingStrategy' => 'size', // now this supervisor will partition # of processes to the queue with the most jobs, not the estimated time to complete
],

On Friday, we launched our app to use the size auto-scaling strategy for all supervisors and have noticed a significant improvement in balance of processes.

If a consumer doesn't set an autoScalingStrategy in the config, it will still default to 'runtime' as it works now. I believe this PR only adds new functionality without breaking backwards compatibility.

@driesvints driesvints changed the title adds autoScalingStrategy option [5.x] Add autoScalingStrategy option Mar 6, 2023
@taylorotwell taylorotwell merged commit a447e39 into laravel:5.x Mar 6, 2023
@taylorotwell
Copy link
Member

Thanks!

@cosmastech
Copy link
Contributor Author

Thanks!

No no, thank you @taylorotwell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants