Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rqscheduler can constantly attempt to register itself. #62

Conversation

ScottSturdivant
Copy link
Contributor

Allow rqscheduler to keep attempting to register itself periodically.

My use case is that I have N identical hosts all running rqworker and rqscheduler processes under the watchful eye of supervisor. As supevisord doesn't support an 'unlimited' value for the 'startretries' program configuration option, eventually the rqscheduler process will be moved into supervisor's failed state. Thus, if the host that was successfully running rqscheduler goes down, none of the other existing hosts can automatically take its place.

Similarly, when performing rolling updates, the new hosts come online and try to launch rqscheduler. This fails because an old host is already executing it. If the rollout is slow, it's possible that the new hosts will again have put rqscheduler into a failed state before the old host is rolled out. This will result in a new deployment where rqscheduler is not running.

This patch will allow rqscheduler itself to keep retrying its registration process. Backwards compatability is preserved by aborting by default.

Allow rqscheduler to keep attempting to register itself periodically.
@selwin
Copy link
Collaborator

selwin commented Nov 23, 2014

Would introducing a --burst option be better for this use case?

Similar to RQ worker, running rqscheduler --burst would scheduled all jobs that need to be scheduled and quit on completion.

This means you can schedule N hosts to run rqscheduler --burst every minute via cron on multiple hosts and will retry infinitely.

@lost-theory
Copy link
Contributor

@selwin What would the behavior be with --burst when two schedulers run at the same time? Seems like you'd have the same problem, one process would throw an error. And it would introduce a dependency on cron (one of the reasons people use a system like rq & rq-scheduler is to get away from cron 😄).

FWIW resque-scheduler (the analog of rq-scheduler in the ruby world) allows you to run multiple schedulers and it handles failover automatically:

https://github.com/resque/resque-scheduler#redundancy-and-fail-over

You may want to have resque-scheduler running on multiple machines for redudancy. Electing a master and failover is built in and default. Simply run resque-scheduler on as many machine as you want pointing to the same redis instance and schedule. The scheduler processes will use redis to elect a master process and detect failover when the master dies. Precautions are taken to prevent jobs from potentially being queued twice during failover even when the clocks of the scheduler machines are slightly out of sync (or load affects scheduled job firing time). If you want the gory details, look at Resque::Scheduler::Locking.

I think this is a good approach, and is very similar to @SirScott's patch (all processes continually try to acquire a 'master' lock until one succeeds, and TTLs allow failover to happen when a scheduler process dies unexpectedly).

@selwin
Copy link
Collaborator

selwin commented Feb 18, 2015

Sorry, I forgot to reply to this issue.

To be honest, I don't think rq-scheduler as a direct replacement for cron as I think it's one of the most battle tested utility out there. rq-scheduler is meant to be something that lets you schedule jobs programatically.

Yes, two schedulers running at the same time would still create an error. But what I like about the --burst approach is that you won't get two active scheduler processes at the same time (the one that errors out would just die).

However, I've also been thinking about the approach @SirScott suggested and am not opposed to it. Can we have a more descriptive name than --retry though?

@jmmills
Copy link

jmmills commented Feb 20, 2015

Could it be that you just enqueue jobs that do the schedule poll? That way the loop is shared across the rqworker cluster?

@selwin
Copy link
Collaborator

selwin commented Aug 19, 2015

@SirScott thanks for writing this PR, please see my comment here: #70 (comment)

@selwin selwin closed this Aug 19, 2015
sandlerben added a commit to hack4impact-upenn/idle-free-philly that referenced this pull request Jan 13, 2016
sandlerben added a commit to hack4impact-upenn/idle-free-philly that referenced this pull request Jan 13, 2016
sandlerben added a commit to hack4impact-upenn/idle-free-philly that referenced this pull request Jan 21, 2016
sandlerben added a commit to hack4impact/flask-base that referenced this pull request Jan 29, 2016
sandlerben added a commit to hack4impact/flask-base that referenced this pull request Jan 29, 2016
sandlerben added a commit to hack4impact-upenn/maps4all that referenced this pull request Nov 14, 2016
rrelaxx pushed a commit to rrelaxx/xrm-ui that referenced this pull request Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants