Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker Query Timeout Metric #11892

Merged
merged 5 commits into from
Nov 3, 2023

Conversation

suddendust
Copy link
Contributor

@suddendust suddendust commented Oct 27, 2023

Exposing a new metric to trace broker query timeouts.

Screenshot 2023-10-28 at 12 21 39 PM

@suddendust suddendust closed this Oct 27, 2023
@suddendust suddendust reopened this Oct 27, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2023

Codecov Report

Merging #11892 (0240d34) into master (5d58102) will decrease coverage by 0.05%.
Report is 25 commits behind head on master.
The diff coverage is 18.18%.

@@             Coverage Diff              @@
##             master   #11892      +/-   ##
============================================
- Coverage     61.45%   61.41%   -0.05%     
+ Complexity     1147      207     -940     
============================================
  Files          2375     2378       +3     
  Lines        128500   128875     +375     
  Branches      19846    19929      +83     
============================================
+ Hits          78974    79144     +170     
- Misses        43815    44005     +190     
- Partials       5711     5726      +15     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.38% <18.18%> (-0.02%) ⬇️
java-21 61.25% <18.18%> (-0.06%) ⬇️
skip-bytebuffers-false 61.40% <18.18%> (-0.01%) ⬇️
skip-bytebuffers-true 27.63% <9.09%> (-33.66%) ⬇️
temurin 61.41% <18.18%> (-0.05%) ⬇️
unittests 61.40% <18.18%> (-0.05%) ⬇️
unittests1 46.62% <100.00%> (-0.03%) ⬇️
unittests2 27.64% <9.09%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...a/org/apache/pinot/common/metrics/BrokerMeter.java 94.00% <100.00%> (+0.12%) ⬆️
...thandler/SingleConnectionBrokerRequestHandler.java 17.72% <0.00%> (-0.47%) ⬇️
...requesthandler/MultiStageBrokerRequestHandler.java 17.80% <12.50%> (-0.90%) ⬇️

... and 64 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@@ -116,6 +117,9 @@ protected BrokerResponseNative processBrokerRequest(long requestId, BrokerReques
realtimeBrokerRequest, realtimeRoutingTable, timeoutMs);
_failureDetector.notifyQuerySubmitted(asyncQueryResponse);
Map<ServerRoutingInstance, ServerResponse> finalResponses = asyncQueryResponse.getFinalResponses();
if (asyncQueryResponse.getStatus() == QueryResponse.Status.TIMED_OUT) {
_brokerMetrics.addMeteredTableValue(rawTableName, BrokerMeter.BROKER_RESPONSES_WITH_TIMEOUTS, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use actual table name (with type) instead of raw table name

@@ -191,6 +192,11 @@ protected BrokerResponse handleRequest(long requestId, String query, @Nullable S
try {
queryResults = _queryDispatcher.submitAndReduce(requestContext, dispatchableSubPlan, queryTimeoutMs, queryOptions,
stageIdStatsMap);
} catch (TimeoutException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify if it can throw TimeoutException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes actually, the query dispatcher throws a timeout exception.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied a commit to fix the broker response for timeout case

In the previous comment, I was referring to using the table name (with or without type suffix), but seems all other metrics are tracked under raw table name, I changed it back (the table name in this stage is already the actual table name).

@Jackie-Jiang Jackie-Jiang merged commit bbeb1da into apache:master Nov 3, 2023
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants