Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle leaking TP exception in handleAssignmentChange #845

Merged

Conversation

surajkn
Copy link
Collaborator

@surajkn surajkn commented Jul 29, 2021

During handleAssignmentChange, while initializing task we were leaking a possible
exception while creating TransportProvider object. This patch fixes it by issuing
a retry on failure.

Important: DO NOT REPORT SECURITY ISSUES DIRECTLY ON GITHUB.
For reporting security issues and contributing security fixes,
please, email security@linkedin.com instead, as described in
the contribution guidelines.

Please, take a minute to review the contribution guidelines at:
https://github.com/linkedin/Brooklin/blob/master/CONTRIBUTING.md

@surajkn surajkn requested a review from vmaheshw July 29, 2021 05:39
@vmaheshw vmaheshw marked this pull request as draft July 29, 2021 17:16
@surajkn surajkn force-pushed the handle_assignment_change_tp_failure branch from c9c58ab to e9006d2 Compare August 2, 2021 16:43
@surajkn surajkn force-pushed the handle_assignment_change_tp_failure branch from e9006d2 to b862840 Compare August 3, 2021 22:32
@surajkn surajkn force-pushed the handle_assignment_change_tp_failure branch from b862840 to 77a0c31 Compare August 5, 2021 18:05
Copy link
Collaborator

@vmaheshw vmaheshw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some code refactoring comments. One thought related to load on zookeeper server after this change.

}

/**
* Constructor for DummyTransportProviderAdminFactory which can optionally throw exception on every send call
* @param throwOnSend whether or not to throw an exception on send calls
*/
public DummyTransportProviderAdminFactory(boolean throwOnSend) {
public DummyTransportProviderAdminFactory(boolean throwOnSend, boolean failTransportProvider) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. change failTransportProvider to failTransportProviderOnce
b. update javadocs.

@@ -583,6 +584,8 @@ private void handleAssignmentChange(boolean isDatastreamUpdate) throws TimeoutEx
}
});

int totalTasks = currentAssignment.values().stream().mapToInt(List::size).sum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can extract method out of this logic.

if ((totalTasks - submittedTasks) > 0) {
_log.warn("Failed to submit {} tasks from currentAssignment. Queueing onAssignmentChange event again", totalTasks - submittedTasks);
// Update the metric and queue the event only once for all the tasks that failed to be initialized
if (isDatastreamUpdate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qn: Why do you need a different code logic for this v/s TimeoutException, when the expectation is the same in both the cases.

        _eventQueue.put(CoordinatorEvent.createHandleDatastreamChangeEvent());
      } else {
        _eventQueue.put(CoordinatorEvent.createHandleAssignmentChangeEvent());
      }
      throw e;
``` There is no direct exception known in this path, but you can create DatastreamRuntimeException(). If you extract the highlighted code into a method and call it, it may work for you.

Logically it is the same code, but different way of achieving the same thing.

@vmaheshw vmaheshw marked this pull request as ready for review August 6, 2021 06:17
@surajkn surajkn force-pushed the handle_assignment_change_tp_failure branch 3 times, most recently from 77b03cf to 30e908a Compare August 7, 2021 06:00
Copy link
Collaborator

@vmaheshw vmaheshw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cosmetic changes and make the retry field configurable.

@surajkn surajkn force-pushed the handle_assignment_change_tp_failure branch 3 times, most recently from 9322769 to 5e5e49b Compare August 16, 2021 18:47
During handleAssignmentChange, while initializing task we were leaking a possible exception while
creating TransportProvider object. To address it we catch the exception for the failed task and mark
that task failed and prevent it from being added to assigned tasks and also queue another handle
assignment change event. Also introducing a maximum retry count for onAssignementChange in case of
errors.
@surajkn surajkn force-pushed the handle_assignment_change_tp_failure branch from 5e5e49b to 8724e19 Compare August 17, 2021 19:03
@surajkn surajkn requested a review from vmaheshw August 17, 2021 19:04
@surajkn surajkn dismissed vmaheshw’s stale review August 19, 2021 18:50

Comments addressed

@surajkn surajkn merged commit 527cce0 into linkedin:master Aug 20, 2021
@surajkn surajkn deleted the handle_assignment_change_tp_failure branch August 20, 2021 17:54
vmaheshw pushed a commit to vmaheshw/brooklin that referenced this pull request Mar 1, 2022
During handleAssignmentChange, while initializing task we were leaking a possible exception while
creating TransportProvider object. To address it we catch the exception for the failed task and mark
that task failed and prevent it from being added to assigned tasks and also queue another handle
assignment change event. Also introducing a maximum retry count for onAssignementChange in case of
errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants