-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Join query on broker becomes uncancellable #17163
Comments
did this query override the timeout by any chance? |
As far as I could ascertain, no. I examined the corresponding |
@Zeyu-Chen-SFDC Thanks for the detailed report! |
We haven't seen that. All these episodes began when the broker was lightly loaded, serving at most 1 other "normal" query concurrently. And the "normal" queries completed successfully. The impact of the stuck joins is as if they simply reduced the jetty threadpool capacity by a constant.
The pattern of activities from the logs is as follows:
|
I think #17099 should have fixed this issue |
No. This issue can still happen as it's on the broker while #17099 fixes it on historicals. The problem being fixed in 17099 was that thread doesn't stop even after being interrupted. Problem here is that the broker HTTP thread is not being interrupted at all because the SQL HTTP thread is continuously busy in the |
Long running join query threads on brokers cannot be cancelled or interrupted
Affected Version
28.0.1
Description
Poorly written join queries are seen busy looping in
PostJoinCursor.advanceToMatch()
on broker's jetty threads. These queries have been running for days. While we have separate efforts to address the queries, we want to release all resources held up on the broker by these joins. When query cancellation is attempted withcurl -XDELETE 127.0.0.1:8088/druid/v2/sql/<QID>
on the broker, 404 response is returned, and the query thread on the broker continues as before.Here are some examined internal states of the broker:
SqlLifecycleManager
object contains the queryid being cancelled on.QueryScheduler.queryFutures
object does not contain any future under the subject queryid.druid.server.http.defaultQueryTimeout=600000
The text was updated successfully, but these errors were encountered: