Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing Beanstalk from teuthology #1650

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

amathuria
Copy link
Contributor

The following changes support the removal of Beanstalk from Teuthology.
In place of Beanstalk, we will now be using Paddles for queue management in Teuthology.
This PR has the corresponding changes for the paddles PR: https://github.com/ceph/paddles/pull/94/files.

The changes include:

  1. Removing all beanstalk related code
  2. Teuthology scheduler and dispatcher using Paddles queue for scheduling and dispatching jobs
  3. Adding support for Paddles queue management
  4. Additional functionality of being able to change the priority of Teuthology jobs in the queued state in the teuthology-queue command

Signed-off-by: Aishwarya Mathuria amathuri@redhat.com

@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch from 946dc36 to 0ef5b15 Compare May 27, 2021 13:59
@jdurgin jdurgin requested a review from kshtsk May 27, 2021 16:39
Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, the retry comments are due to some issues we've had with paddles in the past, particularly recently with db conflicts causing stale sqlalchemy handles that may cause a subsequent request to run into an error.

Working around this with client-side (aka teuthology) retries has mitigated these issues.

teuthology/paddles_queue.py Outdated Show resolved Hide resolved
teuthology/paddles_queue.py Outdated Show resolved Hide resolved
teuthology/paddles_queue.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
teuthology/report.py Outdated Show resolved Hide resolved
@kshtsk
Copy link
Contributor

kshtsk commented Jun 7, 2021

First of all, the tests are failing.

@kshtsk
Copy link
Contributor

kshtsk commented Jun 7, 2021

Second, is it possible to support both scheduling mechanisms for a while?

@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch from 4f83a98 to 4d2ae67 Compare June 7, 2021 16:29
@jdurgin
Copy link
Member

jdurgin commented Jun 7, 2021

Second, is it possible to support both scheduling mechanisms for a while?

It seems like a lot of effort - on the teuthology side one can always use the commit before this merges.

In terms of the sepia lab, I figure we'll announce this widely before it merges and when it is being merged. Folks will notice pretty fast when beanstalkd is no longer running.

@amathuria is planning to test out the paddles side further in sepia before merging the teuthology piece as well.

@kshtsk
Copy link
Contributor

kshtsk commented Jul 15, 2021

Second, is it possible to support both scheduling mechanisms for a while?

It seems like a lot of effort - on the teuthology side one can always use the commit before this merges.

I don't think it is a lot of effort for the moment to support just another scheduling on teuthology client side. I've added --backend option to handle this.

In terms of the sepia lab, I figure we'll announce this widely before it merges and when it is being merged. Folks will notice pretty fast when beanstalkd is no longer running.

It is not only about sepia lab we still use downstream beanstalk with two different teuthology setups and it will be tough to switch everything at once.

I would vote for keeping the beanstalk support in teuthology for awhile so it can run with older paddles and pulpito.

teuthology/schedule.py Outdated Show resolved Hide resolved
teuthology/schedule.py Outdated Show resolved Hide resolved
@jdurgin
Copy link
Member

jdurgin commented Jul 19, 2021

replace beanstalk queue

kamoltat added a commit that referenced this pull request Aug 4, 2021
Developers can now use start.sh to build the images
and set up postgresql, paddles, pulpito and teuthology
for development.

This PR is also pending for:
#1650
ceph/paddles#94

to be merged, as currently we use these branches
as images for paddles and pulpito.

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
kamoltat added a commit that referenced this pull request Sep 16, 2021
Developers can now use start.sh to build the images
and set up postgresql, paddles, pulpito and teuthology
for development.

This PR is also pending for:
#1650
ceph/paddles#94

to be merged, as currently we use these branches
as images for paddles and pulpito.

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
scripts/paddles_queue.py Outdated Show resolved Hide resolved
@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch from 5c0d202 to 61a399c Compare October 6, 2021 14:19
jdurgin
jdurgin previously approved these changes Oct 18, 2021
@amathuria amathuria requested a review from zmc October 25, 2021 15:01
@jdurgin
Copy link
Member

jdurgin commented Nov 30, 2021

@zmc want to take a look at the teuthology side of this change?

Copy link
Member

@zmc zmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. I left some comments in the diff section, but some of them will probably be made irrelevant by the below.

I think we should do some consolidation for simplicity's sake. Rather than add teuthology/paddles_queue.py alongside beanstalk.py, let's create a teuthology/queue subdir so that we can have modules named teuthology.queue.beanstalk and teuthology.queue.paddles. A __init__.py in the queue subdir should contain the code that the two implementations have in common, so that we're not duplicating things like JobProcessor and friends.

We can add a new config setting that controls which queueing mechanism to use (config.queue_backend ?). A __init__.py in the queue subdir can look at that setting and load the appropriate backend. Scripts that have a --queue-backend arg should be able to use that setting's value as their default; then using the actual arg is almost never necessary.

Then, rather than have separate teuthology-beanstalk-queue and teuthology-paddles-queue scripts, which will depend on the user to know which to call, we can leave the existing scripts/queue.py mostly alone.

Last, this PR should probably not change the default backend to paddles just yet, to avoid causing problems for other users.

scripts/worker.py Outdated Show resolved Hide resolved
scripts/beanstalk_queue.py Outdated Show resolved Hide resolved
teuthology/dispatcher/__init__.py Show resolved Hide resolved
teuthology/dispatcher/supervisor.py Outdated Show resolved Hide resolved
teuthology/dispatcher/supervisor.py Outdated Show resolved Hide resolved
kamoltat added a commit that referenced this pull request Jan 14, 2022
Developers can now use start.sh to build the images
and set up postgresql, paddles, pulpito and teuthology
for development.

This PR is also pending for:
#1650
ceph/paddles#94

to be merged, as currently we use these branches
as images for paddles and pulpito.

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch 3 times, most recently from 604c452 to fe3d0c9 Compare May 4, 2022 14:27
@djgalloway
Copy link

The testnode container image

Ah, that's probably why. I think on CentOS9 and/or newer kernels, there was some issue with the ansible systemd module not working. ceph-cm-ansible hasn't been adapted to work inside containers yet though so that's why.

@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch 4 times, most recently from 74d1650 to 5a3edb1 Compare June 15, 2022 14:01
Copy link
Member

@zmc zmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the very early stages of testing this with ceph/paddles#94 via ceph-devstack, but I have run into an issue: it looks like teuthology-schedule isn't noticing that I have backend: paddles in my .teuthology.yaml, so I get RuntimeError: Beanstalk queue information not found in None.

I also see several log entries with POST to http://paddles:8080/queue/stats/ failed with status 400: {"message": "specified queue does not exist"} - so I think there's an issue with creating queues as well. This comment belongs on the paddles side, but I think it'd make a lot of sense to just create queues transparently like beanstalkd does.

Comment on lines 594 to 601
response = self.session.post(uri, data=queue_json, headers=headers)

if response.status_code == 200:
return response.json()
else:
msg = response.text
self.log.error(
"POST to {uri} failed with status {status}: {msg}".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the above be using a GET request?

report_status = False
else:
report_status = True

name = args['--name']
if not name or name.isdigit():
raise ValueError("Please use a more descriptive value for --name")
job_config = build_config(args)
backend = args['--queue-backend']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be reading this from the config object - the CLI flag can take precedence over that value, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@amathuria
Copy link
Contributor Author

I'm in the very early stages of testing this with ceph/paddles#94 via ceph-devstack, but I have run into an issue: it looks like teuthology-schedule isn't noticing that I have backend: paddles in my .teuthology.yaml, so I get RuntimeError: Beanstalk queue information not found in None.

I also see several log entries with POST to http://paddles:8080/queue/stats/ failed with status 400: {"message": "specified queue does not exist"} - so I think there's an issue with creating queues as well. This comment belongs on the paddles side, but I think it'd make a lot of sense to just create queues transparently like beanstalkd does.

Thanks for trying this out! I'll look into these issues

@amathuria
Copy link
Contributor Author

I'm in the very early stages of testing this with ceph/paddles#94 via ceph-devstack, but I have run into an issue: it looks like teuthology-schedule isn't noticing that I have backend: paddles in my .teuthology.yaml, so I get RuntimeError: Beanstalk queue information not found in None.

I also see several log entries with POST to http://paddles:8080/queue/stats/ failed with status 400: {"message": "specified queue does not exist"} - so I think there's an issue with creating queues as well. This comment belongs on the paddles side, but I think it'd make a lot of sense to just create queues transparently like beanstalkd does.

The queue does not exist issue popped up because I wasn't reading backend from teuthology.yaml. Once that is fixed we shouldn't see this error anymore.

@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch 2 times, most recently from 33a4ffd to a6c7093 Compare September 7, 2022 13:38
Copy link
Member

@zmc zmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple bugs; looks like a rebase is needed as well

--supervisor run dispatcher in job supervisor mode
--bin-path BIN_PATH teuthology bin path
--job-config CONFIG file descriptor of job's config file
--exit-on-empty-queue if the queue is empty, exit
--queue-backend BACKEND which backend will be used for the queue
[default: beanstalk]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the default of beanstalk here, we get a traceback if we've got backend: paddles in /etc/teuthology.yaml and omit the flag:

Traceback (most recent call last):
  File "/home/teuthworker/teuthology/virtualenv/bin/teuthology-dispatcher", line 8, in <module>
    sys.exit(main())
  File "/home/teuthworker/teuthology/scripts/dispatcher.py", line 37, in main
    sys.exit(teuthology.dispatcher.main(args))
  File "/home/teuthworker/teuthology/teuthology/dispatcher/__init__.py", line 96, in main
    connection = beanstalk.connect()
  File "/home/teuthworker/teuthology/teuthology/queue/beanstalk.py", line 16, in connect
    raise RuntimeError(
RuntimeError: Beanstalk queue information not found in None

Perhaps we should drop the flag and only read from the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense, I'll make the change

scripts/dispatcher.py Show resolved Hide resolved
@@ -143,6 +143,7 @@ class TeuthologyConfig(YamlConfig):
'archive_upload_key': None,
'archive_upload_url': None,
'automated_scheduling': False,
'backend': 'beanstalk',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'backend': 'beanstalk',
'queue_backend': 'beanstalk',

teuthology/dispatcher/__init__.py Show resolved Hide resolved
try:
report.try_delete_jobs(job_config['name'], job_config['job_id'])
except Exception:
log.exception("Unable to delete job %s", job_config['job_id'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably log the name here as well

teuthology/kill.py Show resolved Hide resolved
backend = args['--queue-backend']
backend = config.backend
if args['--queue-backend']:
backend = args['--queue-backend']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar concern as -dispatcher re: the flag vs. config file

@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch 3 times, most recently from 065b7a3 to 4ae5f62 Compare August 23, 2023 11:19
@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch from 4ae5f62 to dfb91e8 Compare August 31, 2023 14:49
@amathuria amathuria added the DNM label Aug 31, 2023
@amathuria
Copy link
Contributor Author

Adding DNM as the Paddles PR will have to be merged first

@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch 2 times, most recently from 570ac00 to a0b7570 Compare October 13, 2023 03:24
@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch from a0b7570 to fc8245c Compare November 3, 2023 13:22
@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch 2 times, most recently from 9150d49 to 56179ba Compare November 22, 2023 05:43
The following changes support the removal of Beanstalk from Teuthology.
In place of Beanstalk, we will now be using Paddles for queue management in Teuthology.
This PR has the corresponding changes for the paddles PR: https://github.com/ceph/paddles/pull/94/files.

The changes include:
1. Removing all beanstalk related code
2. Teuthology scheduler and dispatcher using Paddles queue for scheduling and dispatching jobs
3. Adding support for Paddles queue management
4. Additional functionality of being able to change the priority of Teuthology jobs in the queued state in the teuthology-queue command

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Previously the teuthology worker unit test used beanstalk, the test will
be using Paddles now.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
1. Add retry loop for the paddles calls.
2. Add run name as a parameter for updating priority of jobs in paddles.
3. Modify the pause queue command to run on server side with an optional pause duration parameter.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
…ith Paddles

With the use of the --queue-backend argument the user can specify which backend(paddles/beanstalk) they would like to use for maintaining the teuthology Jobs queue.
In order to avoid overlapping Job IDs, when a job is being scheduled in beanstalk it is also written to paddles which returns a unique ID.
This is the ID teuthology will treat as the Job ID throughout the run of the job.

To differentiate between the 2 queue backends, the teuthology-queue command has been split into teuthology-paddles-queue command and teuthology-beanstalk-queue command.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
…alk queue

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Makes the same teuthology-queue commands work regardless of the queue backend, Paddles or Beanstalk.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
@amathuria amathuria force-pushed the wip-amathuria-replace-beanstalkd-paddles branch from 56179ba to e520395 Compare November 29, 2023 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants