Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return exception when unavailable segments on empty broker response #7823

Merged
merged 7 commits into from
Nov 24, 2021
Merged

return exception when unavailable segments on empty broker response #7823

merged 7 commits into from
Nov 24, 2021

Conversation

jadami10
Copy link
Contributor

Description

This fixes #7811. In #7264 we added exceptions when not all segments are available, but this was one case that was missed. Specifically if all servers are unavailable, both realtime and offline requests become null, but there are still unavailable segments. This adds the exception to the broker response.

There's no unit tests here which makes it quite difficult for me to setup something. But I tested locally by running the realtime quickstart

  • i ran the query and saw results
  • i "dropped" the server and saw no results
  • i then applied this fix, reran the server drop, and saw the exception
    image

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

This will get encompassed whenever the new error code documentation is done.

@codecov-commenter
Copy link

codecov-commenter commented Nov 24, 2021

Codecov Report

Merging #7823 (6ede774) into master (819cba0) will decrease coverage by 1.16%.
The diff coverage is 28.57%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7823      +/-   ##
============================================
- Coverage     71.70%   70.53%   -1.17%     
- Complexity     4089     4091       +2     
============================================
  Files          1579     1579              
  Lines         80832    80841       +9     
  Branches      12010    12010              
============================================
- Hits          57957    57025     -932     
- Misses        18978    19933     +955     
+ Partials       3897     3883      -14     
Flag Coverage Δ
integration1 ?
integration2 27.99% <28.57%> (+0.02%) ⬆️
unittests1 68.73% <0.00%> (+0.04%) ⬆️
unittests2 14.59% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...roker/requesthandler/BaseBrokerRequestHandler.java 70.13% <16.66%> (-1.17%) ⬇️
...t/common/response/broker/BrokerResponseNative.java 93.33% <100.00%> (+0.10%) ⬆️
...pinot/minion/exception/TaskCancelledException.java 0.00% <0.00%> (-100.00%) ⬇️
...nverttorawindex/ConvertToRawIndexTaskExecutor.java 0.00% <0.00%> (-100.00%) ⬇️
...e/pinot/common/minion/MergeRollupTaskMetadata.java 0.00% <0.00%> (-94.74%) ⬇️
...plugin/segmentuploader/SegmentUploaderDefault.java 0.00% <0.00%> (-87.10%) ⬇️
.../transform/function/MapValueTransformFunction.java 0.00% <0.00%> (-85.30%) ⬇️
...ot/common/messages/RoutingTableRebuildMessage.java 0.00% <0.00%> (-81.82%) ⬇️
...ache/pinot/common/lineage/SegmentLineageUtils.java 22.22% <0.00%> (-77.78%) ⬇️
...ore/startree/executor/StarTreeGroupByExecutor.java 0.00% <0.00%> (-77.78%) ⬇️
... and 99 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 819cba0...6ede774. Read the comment docs.

@@ -459,7 +459,13 @@ private BrokerResponseNative handleSQLRequest(long requestId, String query, Json
if (offlineBrokerRequest == null && realtimeBrokerRequest == null) {
LOGGER.info("No server found for request {}: {}", requestId, query);
_brokerMetrics.addMeteredTableValue(rawTableName, BrokerMeter.NO_SERVER_FOUND_EXCEPTIONS, 1);
return BrokerResponseNative.EMPTY_RESULT;
BrokerResponseNative brokerResponse = BrokerResponseNative.EMPTY_RESULT;
if (numUnavailableSegments > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention of the PR #7397 was to always add this exception to BrokerResponse if numUnavailableSegments is > 0 and this case was missed ?

So may be we just do this once by moving the code at line 512 to proper place that covers all scenarios unconditionally ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little tough to do with the way the code is structured. We can't return early when there's segments missing because many use cases still want partial results if possible. And there are several places in this function that short-circuit and return their own response before adding previous exceptions.

But I've moved the code up to create a list of exceptions we can add to. And I've added the missing segments exception to the timeout response as well.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines 459 to 463
List<QueryProcessingException> exceptions = new ArrayList<>();
if (numUnavailableSegments > 0) {
exceptions.add(new QueryProcessingException(QueryException.BROKER_SEGMENT_UNAVAILABLE_ERROR_CODE,
String.format("%d segments %s unavailable", numUnavailableSegments, unavailableSegments)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can store a list of ProcessingException which is easier to manage

Suggested change
List<QueryProcessingException> exceptions = new ArrayList<>();
if (numUnavailableSegments > 0) {
exceptions.add(new QueryProcessingException(QueryException.BROKER_SEGMENT_UNAVAILABLE_ERROR_CODE,
String.format("%d segments %s unavailable", numUnavailableSegments, unavailableSegments)));
}
List<ProcessingException> exceptions = new ArrayList<>();
if (numUnavailableSegments > 0) {
exceptions.add(QueryException.getException(QueryException.BROKER_SEGMENT_UNAVAILABLE_ERROR_CODE,
String.format("%d segments %s unavailable", numUnavailableSegments, unavailableSegments)));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside of this is under the hood, getException does

String errorType = processingException.getMessage();
    ProcessingException copiedProcessingException = processingException.deepCopy();
    copiedProcessingException.setMessage(errorType + ":\n" + errorMessage);
    return copiedProcessingException;

which is setting the error message to "message": "null:\n1 segments [meetupRsvp_REALTIME_1637786040816_0__0__1637786041067] unavailable". The null:\n looks weird. I could make that code not do that on nulls, but I have no idea what else changes at that point.

Comment on lines 468 to 470
BrokerResponseNative brokerResponse = BrokerResponseNative.EMPTY_RESULT;
brokerResponse.addToExceptions(exceptions);
return brokerResponse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(major) We cannot modify the constant broker response. Here we need to create a new one. If we apply the suggested change above, we can directly use the exception constructor

Suggested change
BrokerResponseNative brokerResponse = BrokerResponseNative.EMPTY_RESULT;
brokerResponse.addToExceptions(exceptions);
return brokerResponse;
return new BrokerResponseNative(exceptions);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, good catch.

Comment on lines 497 to 500
BrokerResponseNative brokerResponse =
new BrokerResponseNative(QueryException.getException(QueryException.BROKER_TIMEOUT_ERROR, errorMessage));
brokerResponse.addToExceptions(exceptions);
return brokerResponse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be simplified:

Suggested change
BrokerResponseNative brokerResponse =
new BrokerResponseNative(QueryException.getException(QueryException.BROKER_TIMEOUT_ERROR, errorMessage));
brokerResponse.addToExceptions(exceptions);
return brokerResponse;
exceptions.add(QueryException.getException(QueryException.BROKER_TIMEOUT_ERROR, errorMessage));
return new BrokerResponseNative(exceptions);

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/query/sql silently returns when all servers are down
5 participants