Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support to show running queries and cancel query by id #9171

Merged
merged 2 commits into from
Aug 17, 2022

Conversation

klsince
Copy link
Contributor

@klsince klsince commented Aug 5, 2022

This PR adds the support to list running queries and cancel a query as needed.

As context, Pinot is typically used for low latency + high QPS scenario, so query cancellation hasn't been really needed. But as more and more data gets kept in Pinot, users might start to run exploratory queries over long time window, which may take minutes to finish. Query cancel support makes life easier to Iterate those kinda queries. Timeout is less handy here, as setting it too low aborts the query too soon; setting it too high might let query run off, if the query is not written properly (which often happens while exploring the data).

The key change is the extension in QueryScheduler to track map<queryId, queryFuture>. Essentially, the running query is cancelled by calling queryFuture.cancel(), which firstly interrupts task running on pqr thread, which then interrupts tasks running on pqw threads.

The other changes are for adding GET /queries and DELETE /query/<queryId> rest APIs on broker, which calls DELETE /query/<queryId> on servers via the MultiHttpMethod util to cancel the query that may be running on multiple servers.

In this PR, each broker tracks the running queries submitted to it, and not knowing the queries tracked by other brokers. So as next step, may also add the two rest APIs on controller, so that users can have cluster wide view of the running queries and cancel them from controller.

Release Notes

A new config to turn on and off the query cancellation feature on server and broker side

  1. pinot.query.scheduler.enable.query.cancellation, false by default
  2. pinot.broker.enable.query.cancellation, false by default

@klsince klsince force-pushed the support_query_cancel branch 3 times, most recently from d3dc5b2 to 3192574 Compare August 5, 2022 21:18
@codecov-commenter
Copy link

codecov-commenter commented Aug 5, 2022

Codecov Report

Merging #9171 (9279568) into master (2aa2165) will decrease coverage by 1.29%.
The diff coverage is 43.38%.

❗ Current head 9279568 differs from pull request most recent head 637ad89. Consider uploading reports for the commit 637ad89 to get more accurate results

@@             Coverage Diff              @@
##             master    #9171      +/-   ##
============================================
- Coverage     70.01%   68.72%   -1.30%     
- Complexity     4762     4999     +237     
============================================
  Files          1856     1857       +1     
  Lines         98923    99079     +156     
  Branches      15042    15066      +24     
============================================
- Hits          69263    68089    -1174     
- Misses        24754    26116    +1362     
+ Partials       4906     4874      -32     
Flag Coverage Δ
integration1 26.32% <22.75%> (+0.11%) ⬆️
integration2 ?
unittests1 67.08% <53.16%> (-0.02%) ⬇️
unittests2 15.39% <21.96%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pinot/broker/api/resources/PinotClientRequest.java 43.28% <0.00%> (-15.91%) ⬇️
...ot/broker/requesthandler/BrokerRequestHandler.java 100.00% <ø> (ø)
...r/requesthandler/BrokerRequestHandlerDelegate.java 52.50% <0.00%> (-2.77%) ⬇️
...ata/manager/realtime/RealtimeTableDataManager.java 44.06% <0.00%> (-25.08%) ⬇️
...e/pinot/core/query/request/ServerQueryRequest.java 45.94% <0.00%> (-48.50%) ⬇️
...t/ConcurrentMapPartitionUpsertMetadataManager.java 66.66% <0.00%> (-1.00%) ⬇️
...psert/ConcurrentMapTableUpsertMetadataManager.java 50.00% <0.00%> (-50.00%) ⬇️
...ache/pinot/server/api/resources/QueryResource.java 0.00% <0.00%> (ø)
...rg/apache/pinot/server/starter/ServerInstance.java 74.43% <0.00%> (-12.69%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 27.69% <ø> (ø)
... and 154 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@npawar npawar added the feature label Aug 9, 2022
}
boolean done = future.isDone();
if (!done) {
future.cancel(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a query that is long running, this cancel may not have much effect? The check for isInterrupted only happens in the base class when we go to a new operator. So if we're already past that check, and doing a long running op, we'll still have to wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think operators either finish their compute quickly or wait for sth (like the blockingQueue.poll). The waits can be interrupted.

A compute operator might miss the isInterrupted flag at the beginning but should end itself quickly. Or better refactor it to check the flag more often during the compute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, we plan to use the same cancellation mechanism for cpu time/memory usage based query pre-emption. Do you think the current frequency of checking on if (Thread.interrupted()) at each nextBlock() is insufficient to preempt a query in timely manner @npawar ?

@klsince klsince force-pushed the support_query_cancel branch 3 times, most recently from 36bc407 to 5bb9111 Compare August 10, 2022 20:18
@jasperjiaguo
Copy link
Contributor

jasperjiaguo commented Aug 11, 2022

LGTM. Don't see any obvious conflict with the in progress query killing work.

@klsince klsince force-pushed the support_query_cancel branch 2 times, most recently from a606273 to 4d660eb Compare August 11, 2022 21:55
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check the code path on the server side when an interrupt is sent. I believe the query can be cancelled, but the exception message will be quite confusing because we didn't expect getting interruption. This can be done with a separate PR.

@klsince klsince force-pushed the support_query_cancel branch 3 times, most recently from acd3e33 to de4c666 Compare August 17, 2022 16:34
@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Aug 17, 2022
@Jackie-Jiang Jackie-Jiang merged commit 3a655d2 into apache:master Aug 17, 2022
@Jackie-Jiang
Copy link
Contributor

Let's add this new feature into the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants