Execute Queries in RQ #4413

rauchy · 2019-11-28T15:17:08Z

What type of PR is this? (check all applicable)

Refactor

Description

Moves query executions from Celery to RQ.

Fix tests
Remove all Celery monitoring stuff
Try to stick to the term "job" wherever possible (remove "task"s)
Test on different data sources
Perform load tests

I was really tempted to clean up the code as I went along with this refactoring, but I decided to move things as they are and get them working on RQ, then simplify.

Related Tickets & Documents

#4307

…alization clearer

arikfr · 2019-11-30T12:45:26Z

redash/tasks/queries/execution.py

-                    'scheduled': scheduled_query_id is not None,
-                    'query_id': metadata.get('Query ID'),
-                    'user_id': user_id
-                })


👋 bye, bye.

Btw, does the RQ API let us segment tasks by the same dimensions (org_id, data_source_id, query_id, user_id)?

arikfr · 2019-11-30T12:49:22Z

redash/tasks/queries/execution.py

-        else:
-            self._async_result = AsyncResult(job_id, app=celery)
+    def __init__(self, job):
+        self._job = job if isinstance(job, Job) else CancellableJob.fetch(job, connection=rq_redis_connection)


Maybe class method to create them to avoid confusion and weird issues? (from_job_id and from_job)

arikfr · 2019-11-30T12:51:08Z

redash/tasks/queries/execution.py


-        if isinstance(result, (TimeLimitExceeded, SoftTimeLimitExceeded)):
+        result = self._job.result
+        if isinstance(result, JobTimeoutException):


I'm not sure we still return this one.

rauchy · 2019-12-01T12:42:57Z

redash/tasks/queries/execution.py

@@ -111,10 +109,10 @@ def enqueue_query(query, data_source, user_id, is_api_key=False, scheduled_query
            if job_id:


I know I said I’ll port enqueue_query from Celery to RQ as is and deal with refactoring later, but I’m really tempted to use RQ’s ability to set custom IDs to jobs and get rid of the whole lock mechanism using it (i.e. job IDs will be rq:job:<ds.id>:<query_hash> ). WDYT @arikfr?

This is indeed tempting! But --

We lose ability to track down specific invocation (in logs and such), because every invocation has the same id.

Current hash implementation needs fixing (Case insensitive parameters in Redash query result cache #2137).

Are we sure job IDs were meant to be used this way?

That could be easily solved by a custom description or meta attribute holding an invocation counter.

That needs fixing regardless, but does that affect whether we should drop locks?

It's not a hack, if that's what you mean. RQ docs specify that you can use a predetermined ID instead of the auto-generated one.

RQ docs specify that you can use a predetermined ID instead of the auto-generated one.

But do they expect the ID to be non unique/reusable? An execution's id feels like something that supposed to be unique, I'm just concerned about the can of worms we're opening if we assign a non unique one.

All jobs scheduled from rq-scheduler reuse the same identifier :)

You might want to check out: rq/rq#793.

TL;DR:

What sirex said, the job will be executed twice (if the job is still in Redis).

I fail to see the issue. We still check for the job's existence in Redis. We get the same effect of a lock by checking if the job is queued. We just don't have to worry about maintaining another key and expiring it.

I thought you were suggesting reusing job ids to avoid having to implement our own lock.

While we don't need to worry about another key, we do need to worry about lots of other things and fully understand how rq works. Considering that the current implementation works and we have reasonable processes for it, I don't think we will gain much here at this stage, until we're more familiar with RQ. The risk vs. gain is just not worth it.

This reverts commit 37a74ea.

…ior details

arikfr · 2019-12-05T07:35:25Z

redash/tasks/queries/execution.py

@@ -26,75 +27,6 @@ def _unlock(query_hash, data_source_id):
    redis_connection.delete(_job_lock_id(query_hash, data_source_id))


-class QueryTask(object):


👋 good bye old friend. This class survived since the very first iterations of Redash (it was actually named Job back then), when we had my half-assed background job implementation and Tornado as the web server.

…bose connection setting

rauchy · 2019-12-08T10:40:45Z

Also, can't we just call push_connection when starting the app or something?

After too much frolicking, I realized that setting this at the start of the app doesn't play nice with Flask's threads and RQ's LocalStack. I'll keep the pre-request & post-request for now unless you can see any fundamental issues with it.

arikfr · 2019-12-08T10:44:34Z

I'll keep the pre-request & post-request for now unless you can see any fundamental issues with it.

Do we really push a request context when starting the app and/or executing a job?

rauchy · 2019-12-08T11:04:08Z

@arikfr which query runners do you think I should try this on?

Omer Lachish added 15 commits November 13, 2019 22:59

enforce hard limits on non-responsive work horses by workers

ed925d5

move differences from Worker to helper methods to help make the speci…

859fe2a

…alization clearer

move HardLimitingWorker to redash/tasks

d120100

Merge branch 'master' into hard-time-limit

86b9075

move schedule.py to /tasks

1fa6abf

explain the motivation for HardLimitingWorker

1251b9b

pleasing CodeClimate

4ae624b

pleasing CodeClimate

9cfd453

port query execution to RQ

9c855cb

get rid of argsrepr

0c8f0b4

avoid star imports

768f0f6

Merge branch 'master' into execute-query-in-rq

916557d

Merge branch 'hard-time-limit' into execute-query-in-rq

5983541

allow queries to be cancelled in RQ

edd656f

return QueryExecutionErrors as job results

1d8af9c

arikfr reviewed Nov 30, 2019

View reviewed changes

rauchy commented Dec 1, 2019

View reviewed changes

weekly-digest bot mentioned this pull request Dec 2, 2019

Weekly Digest (25 November, 2019 - 2 December, 2019) #4417

Closed

Omer Lachish added 6 commits December 4, 2019 12:40

fix TestTaskEnqueue and QueryExecutorTests

a96ee82

remove Celery monitoring

37a74ea

get rid of QueryTask and use RQ jobs directly (with a job serializer)

faf5166

Revert "remove Celery monitoring"

66f3db9

This reverts commit 37a74ea.

reduce occurences of the word 'task'

045cb96

use Worker, Queue and Job instead of spreading names that share behav…

d486cbb

…ior details

arikfr mentioned this pull request Dec 5, 2019

RQ: implement reliable timeout #4305

Closed

arikfr reviewed Dec 5, 2019

View reviewed changes

Omer Lachish added 3 commits December 5, 2019 08:10

remove locks for failed jobs as well

8b1a471

did I not commit that colon? oh my

75bbbf8

push the redis connection to RQ's stack on every request to avoid ver…

0637080

…bose connection setting

rauchy changed the base branch from master to python-3 December 9, 2019 08:01

rauchy changed the base branch from python-3 to master December 9, 2019 08:01

use a connection context for tests

f086b81

rauchy marked this pull request as ready for review December 11, 2019 21:34

weekly-digest bot mentioned this pull request Dec 16, 2019

Weekly Digest (9 December, 2019 - 16 December, 2019) #4451

Closed

rauchy mentioned this pull request Dec 26, 2019

Show error origin #4494

Closed

1 task

Omer Lachish added 3 commits December 30, 2019 10:29

black it up

fd3417b

Merge branch 'master' into execute-query-in-rq

d71eff5

run RQ on all queues when running in Cypress

33af0b3

rauchy merged commit 329e859 into master Dec 30, 2019

rauchy mentioned this pull request Jan 5, 2020

Nuke Celery #4521

Merged

1 task

weekly-digest bot mentioned this pull request Jan 6, 2020

Weekly Digest (30 December, 2019 - 6 January, 2020) #4523

Closed

guidopetri deleted the execute-query-in-rq branch November 4, 2023 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute Queries in RQ #4413

Execute Queries in RQ #4413

rauchy commented Nov 28, 2019 •

edited

Loading

arikfr Nov 30, 2019

arikfr Nov 30, 2019

arikfr Nov 30, 2019

arikfr Nov 30, 2019

rauchy Dec 1, 2019

arikfr Dec 1, 2019

rauchy Dec 2, 2019

arikfr Dec 2, 2019

rauchy Dec 2, 2019

arikfr Dec 2, 2019

arikfr Dec 2, 2019

rauchy Dec 2, 2019 •

edited

Loading

arikfr Dec 2, 2019

arikfr Dec 5, 2019

rauchy commented Dec 8, 2019

arikfr commented Dec 8, 2019

rauchy commented Dec 8, 2019

		@@ -111,10 +109,10 @@ def enqueue_query(query, data_source, user_id, is_api_key=False, scheduled_query
		if job_id:

		@@ -26,75 +27,6 @@ def _unlock(query_hash, data_source_id):
		redis_connection.delete(_job_lock_id(query_hash, data_source_id))


		class QueryTask(object):

Execute Queries in RQ #4413

Execute Queries in RQ #4413

Conversation

rauchy commented Nov 28, 2019 • edited Loading

What type of PR is this? (check all applicable)

Description

Related Tickets & Documents

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rauchy Dec 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rauchy commented Dec 8, 2019

arikfr commented Dec 8, 2019

rauchy commented Dec 8, 2019

rauchy commented Nov 28, 2019 •

edited

Loading

rauchy Dec 2, 2019 •

edited

Loading