Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add master task throttling #553

Closed
wants to merge 4 commits into from

Conversation

dhwanilpatel
Copy link
Contributor

@dhwanilpatel dhwanilpatel commented Apr 14, 2021

Signed-off-by: Dhwanil Patel dhwanip@amazon.com

Description

Adding throttling functionality of tasks on master nodes. Master will reject tasks from data nodes based on throttling config and data nodes will retry with exponential backoff on those tasks to submit on master node.

It can be enabled/disabled by dynamic setting, we can add throttling config for task type using dynamic setting as well.

Command for enabling/disabling.
curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "persistent": { "master.throttling.enable": "true" } } '

Commands for adding throttling config:
For adding throttling config for any of the task type, we will need to find class path for underlying request object and provide that path in config.

For e.g. for adding limit for put-mapping task type,
PutMappingClusterStateUpdateRequest is underlying request object to master for "put-mapping" tasks.

curl -XPUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "persistent": { "master.throttling.thresholds" : { "org/opensearch/action/admin/indices/mapping/put/PutMappingClusterStateUpdateRequest" : { "value" : 10 } } } }'


Older PR for previous conversations

#484

Issues Resolved

closes #479

Check List

  • New functionality includes testing.
    • All tests pass
    • Newly introduced tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@odfe-release-bot
Copy link

✅   Gradle Wrapper Validation success c9e6126

@odfe-release-bot
Copy link

✅   DCO Check Passed c9e6126

@odfe-release-bot
Copy link

✅   Gradle Precommit success c9e6126

@nknize
Copy link
Collaborator

nknize commented Apr 15, 2021

start gradle check

@odfe-release-bot
Copy link

❌   Gradle Check failure c9e6126
Log 79

Reports 79

@peterzhuamazon
Copy link
Member

start dco check
start wrapper validation
start gradle precommit

@odfe-release-bot
Copy link

✅   DCO Check Passed c9e6126

@odfe-release-bot
Copy link

✅   Gradle Wrapper Validation success c9e6126

@odfe-release-bot
Copy link

✅   Gradle Precommit success c9e6126

Comment on lines +33 to +36
release(diff);
} else if (diff < 0) {
// remove permits
reducePermits(Math.negateExact(diff));
Copy link
Contributor Author

@dhwanilpatel dhwanilpatel Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old PR Comment:

what happens if those permits are already acquired? does it fail or wait for those permits to be released?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update call will pass even if all permits are acquired. New acquire will fail, it will pass once enough permits are not released and there are available permits as per new limit.

For e.g. Limit is 8, acquired 8 permits, Changed limit to 5, release 3 permits, acquire of 1 permit wont be allowed as per new limit and there are already 5 permits are acquired.

Comment on lines 578 to 580
IndexingPressure.MAX_INDEXING_BYTES,
MasterTaskThrottler.ENABLE_MASTER_THROTTLING,
MasterTaskThrottler.THRESHOLD_SETTINGS)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old PR Comment:

We will need to document these new settings and also label the PR accordingly for minor version, it will be available in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once we finalized the documentation strategy will update document. I have updated this in SIM to keep track of documentation. Will keep that issue open until documentation is not added.

Don't have permission to add lebel in PR.

Comment on lines 39 to 42
* We can also provide the class path of request, which we want to throttle. If new task type has introduced and we
* haven't added that in configuration then we can use the class path of request to enable throttling on that type.
* e.g use "org/opensearch/action/admin/indices/create/CreateIndexClusterStateUpdateRequest" as key in setting
* throttling limit for Create index tasks.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old PR Comment

How will this setting key look like in cluster state? What happens in case as part of some refactoring we move around classes? Then, all of a sudden these settings would start failing? or are we going to migrate these settings automatically?

Also it feels like a backdoor, what if user keeps on provide non throttling applicable classes here, how do you control that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this setting key look like in cluster state?

Key will looks same in cluster setting. Like "org/opensearch/action/admin/indices/create/CreateIndexClusterStateUpdateRequest".

What happens in case as part of some refactoring we move around classes? Then, all of a sudden these settings would start failing?

Yes if class are refactored it would start failing. Ideally new node will not come up as prepareCommit it self fails to apply setting.
For refactoring cases, we will need to remove old settings, move cluster to new release with refactored class and then reapply with new class path.
Do you have any suggestion to improve this behavior? Will be happy to improve if there is any scope.

Also it feels like a backdoor, what if user keeps on provide non throttling applicable classes here, how do you control that?

Nice Suggestion!
I didn't had any verification for it, but I have added it now. While applying setting we will check class should be MasterNode Request or ClusterState Update Request. Through this validation we can control it. Please let me know If I have missed any other request type in check. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhwanilpatel : I am thinking why do we even need this? Anytime a new class is added, it should be caught in the review and a decision should be taken whether it should be added in throttler or not.

Copy link
Contributor Author

@dhwanilpatel dhwanilpatel May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is good, I think it will be cleaner way as well. Operator don't have to worry about class path. I will evaluate all the task type and add those here in this mapping where we may need throttling.

Just one concern, Is there any way to make sure we caught new tasks in PR and evaluate requirement of throttling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found one for issue while doing above change, for most of the task submission it is using anonymous class, so can't rely on class name of submitted task for those.

I have changed logic little bit now. Now we will do throttling based on executor type. Each task has its own executor implementation so relationship would be 1:1 only. I have added one method to get the throttling key in ClusterStateTaskExecutor. For tasks where we want to configure throttling they will override it and provide the key for throttling.

I have analyzed the various task types and added key for most of the tasks which are generated from customer like create index/put mapping/etc. Please validate that if I have covered enough tasks or not.

Comment on lines 127 to 130
public long throttledTaskCount() {
return throttledTasksCount.count();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old PR Comment:

could we maintain throttledTasksCount at the task level? As I understand this is a cumulative count, wondering should it get reset during disabling of throttling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have captured task level throttledTasksCount. We are also planing to add this stats in stats API, it will help in that as well.

As I understand this is a cumulative count, wondering should it get reset during disabling of throttling?

I don't see any reason for re-setting it to zero once we disable throttling. IMO we should maintain how many tasks were throttled in past.
Do you have specific case in mind for which we need to reset it to zero?

@dhwanilpatel
Copy link
Contributor Author

Backward Compatibility:
In this change we need to take care of b/w compatibility as new changes should be present in both master and data nodes for this functionality. Also we have introduced new settings, will need to check b/w compatibility for that as well.

Discussion regarding this in previous PR: (Discussion)

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@odfe-release-bot
Copy link

✅   Gradle Wrapper Validation success 537d818

@odfe-release-bot
Copy link

✅   DCO Check Passed 537d818

@odfe-release-bot
Copy link

✅   Gradle Precommit success 537d818

put("delete_snapshot", DeleteSnapshotRequest.class);
}
};
private final int DEFAULT_THRESHOLD_VALUE = -1; // Disabled throttling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should have reasonable defaults for these values so that there is protection in the cluster even if user has not configured it explicitly.
But in this case, it can be breaking change for customer as they would not have exception handling for the new exception for end customer calls or old data nodes requests during upgrade. I am thinking we can keep default as -1 and switch to reasonable default in next major version for OpenSearch.

Scenarios which will be impacted during upgrade when we have changed the defaults from -1.

  1. Old master / old data nodes - No Issue
  2. Old master/ new data nodes - No Issue
  3. New master / new data nodes - No Issue
  4. New master / old data nodes - Issue. Ideally it is expected for data nodes to upgrade first and then master nodes and then switch the active master as data nodes in lower version may not be able to connect to master in high version. So, it is ok if this is called out as breaking change as part of next OpenSearch major version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For minor version upgrades same scenario mentioned by you can cause issue (New master / old data nodes) If customer changes the threshold limit during upgrade and it starts throwing new exception. In that case it would be breaking change.

Also we are introducing new settings, will it also be breaking change as older minor version nodes wont be able to understand new setting if customer has set it during upgrade.

Ideally it is expected for data nodes to upgrade first and then master nodes and then switch the active master as data nodes in lower version may not be able to connect to master in high version

Does it stand for minor version upgrade as well?
If yes, then I think we are covered in above two cases for minor version upgrades and expecting customer to follow this process for upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For minor version upgrades same scenario mentioned by you can cause issue (New master / old data nodes) If customer changes the threshold limit during upgrade and it starts throwing new exception. In that case it would be breaking change.

We can recommend customers to not enable this setting during upgrade otherwise their request could throttle without any automatic retries. They should enable this setting once the upgrade has finished.

Also we are introducing new settings, will it also be breaking change as older minor version nodes wont be able to understand new setting if customer has set it during upgrade.

Addition of new settings shouldn't be a breaking change, even on old nodes, can you try it?

Does it stand for minor version upgrade as well?

Ideally, it applies to all version upgrades minor or major.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can recommend customers to not enable this setting during upgrade otherwise their request could throttle without any automatic retries. They should enable this setting once the upgrade has finished.

For that I have added validation of min version in cluster while updating setting. I have added that validation in enable/disable setting. Will move it to threshold setting as we are relying on it only.

Addition of new settings shouldn't be a breaking change, even on old nodes, can you try it?

Actually, I had tried it. If new setting is applied and we call _cluster/settings API from older version node then, it would fail to parse new setting. We are recommending not to enable this setting during upgrade so I think that will cover this case as well.
What do you think?

Ideally, it applies to all version upgrades minor or major.

Okay, yes it make sense to follow this for all upgrades.

org.opensearch.cluster.service.MasterTaskThrottlingException.class,
org.opensearch.cluster.service.MasterTaskThrottlingException::new,
161,
Version.V_7_10_3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO as the version will change as per OpenSearch versioning and also depending on which version this PR will be tagged/ merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO, will keep this comment as unresolved as well to keep track of it.

@AmiStrn
Copy link
Contributor

AmiStrn commented May 6, 2021

I am wondering if it's possible to split this PR into phases so that it is not so big all at once.
I say this because this PR is affecting many files and adding about 1.7K lines to the project.
Regardless of the fact that it is an important feature, IMO the PR is massive and may take a long time to be approved at its current state.
@dhwanilpatel what do you think? (it's only a suggestion of course)

…er driven task types

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success cf3f16d

@opensearch-ci-bot
Copy link
Collaborator

✅   DCO Check Passed cf3f16d

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success cf3f16d

@dhwanilpatel
Copy link
Contributor Author

dhwanilpatel commented May 6, 2021

I am wondering if it's possible to split this PR into phases so that it is not so big all at once.
I say this because this PR is affecting many files and adding about 1.7K lines to the project.
Regardless of the fact that it is an important feature, IMO the PR is massive and may take a long time to be approved at its current state.
@dhwanilpatel what do you think? (it's only a suggestion of course)

@AmiStrn Initially I hadn't thought it would become this large. Apologies for not planning it correctly.

My only worry is that I already had to discard one PR(#484) which had very good discussion in review.
This one also has discussion covering many important topics like backward compatibility/recommended way for use this feature and all.

It would be little inconvenient for reviewers and other members to go through the discarded PRs to look at those discussion.

Giving little context on main code files which might help in review:

  • Throttler: Core logic of throttling based on semaphores. Can be reusable for any throttling use case.

  • MasterTaskThrottler: It contains the settings of throttling and extends Throttler class for the use case of task throttling.

  • BackoffPolicy: Added new exponential back off policy with equal jitter.

  • MasterThrottlingRetryListener: Retry logic on throttled tasks. Added this listener in TransportMasterNodeAction to perform retries on throttled master task.

  • MasterService: Plugged the MasterTaskThrottler on submitting new tasks.

Above class contains the major functionality of feature, rest of the major component of code is UTs/ITs.

Still if you think it would help to review separate PRs will be happy to split it in different PRs.

@dhwanilpatel dhwanilpatel mentioned this pull request May 6, 2021
6 tasks
@AmiStrn
Copy link
Contributor

AmiStrn commented May 6, 2021

I am not a maintainer of the project so it really is up to them at this point. My preference would be to see it broken down, however, I appreciate the effort in your comment and issue description to make things easier. I will try to understand the PR in its current state:)

Copy link
Contributor

@itiyamas itiyamas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, Dhwanil. We are almost there :)
We can't push this change without a clear plan and integration tests for BWC. Let us add the tests.

I will review the tests in detail once you make all the changes.

In future, please raise small PRs. If you can find a way to break the PR, just do so even if the combined PR is still smaller as these features have a tendency to become huge.


protected MasterNodeRequest() {
}

protected MasterNodeRequest(StreamInput in) throws IOException {
super(in);
masterNodeTimeout = in.readTimeValue();
remoteRequest = in.readOptionalBoolean();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add version check.

}

@Override
public void writeTo(StreamOutput out) throws IOException {
super.writeTo(out);
out.writeTimeValue(masterNodeTimeout);
out.writeOptionalBoolean(remoteRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version check?

@@ -71,6 +74,11 @@ public final Request masterNodeTimeout(TimeValue timeout) {
return (Request) this;
}

public final Request setRemoteRequest(boolean remoteRequest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public setter- why not add to the constructor?

long delay = backoffDelay.next().getMillis();
if(totalDelay + delay >= request.masterNodeTimeout.getMillis()) {
delay = request.masterNodeTimeout.getMillis() - totalDelay;
scheduler.schedule(new Runnable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need to delay the exception? Directly call listener? Better to throw exception early?

@@ -77,6 +77,10 @@ public String describeTasks(List<ClusterStateUpdateTask> tasks) {
*/
public abstract void onFailure(String source, Exception e);

// public String getMasterThrottlingKey() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

"cluster-reroute-api"
)));

private final int DEFAULT_THRESHOLD_VALUE = -1; // Disabled throttling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disabled by default?

throw new IllegalArgumentException("Master task throttling is not configured for given task type: " + key);
}
int threshold = groups.get(key).getAsInt("value", DEFAULT_THRESHOLD_VALUE);
if(threshold < DEFAULT_THRESHOLD_VALUE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why default threshold value? It may change in future? If not, rename default to min?

* Contains stats of Master Task Throttling.
* It stores the total cumulative count of throttled tasks per task type.
*/
public class MasterThrottlingStats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved to Throttler?

* Contains stats of Master Task Throttling.
* It stores the total cumulative count of throttled tasks per task type.
*/
public class MasterThrottlingStats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A useful stat would be how long since a task was throttled, but not yet released? Would be helpful in debugging- basically oldest throttled task. What do you think?

}

@Override
public void onFailure(Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add BWC tests for the following cases:

  1. Old data node, new master node: Here, master may throttle the request and data node will propagate the throttling error to client, which may not be gracefully handled upstream. One way to get around this problem is to make master node know which node the request came from and handle the requests accordingly- e.g add a field in MasterNodeRequest about the same. Do you plan to handle this in a different way?
  2. Old master node, new data node: This will work, I think. I think the setting update itself will fail.

@nknize
Copy link
Collaborator

nknize commented Jun 24, 2021

I am wondering if it's possible to split this PR into phases so that it is not so big all at once.

+1 to split into smaller incremental PRs. It's difficult to track if this should be merged in the next major release (2.0.0) and already the PR has stalled over two months.

@nknize nknize added discuss Issues intended to help drive brainstorming and decision making feature New feature or request stalled Issues that have stalled untriaged WIP Work in progress labels Jun 24, 2021
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success cf3f16d

@opensearch-ci-bot
Copy link
Collaborator

✅   DCO Check Passed cf3f16d

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success cf3f16d

@minalsha minalsha removed the untriaged label Sep 7, 2021
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@CEHENKLE
Copy link
Member

@dhwanilpatel Hey, it's been a while since this was updated....Are you planning to break this into smaller PRs?

Thanks!
/C

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success cf3f16d

@opensearch-ci-bot
Copy link
Collaborator

✅   DCO Check Passed cf3f16d

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success cf3f16d

@dhwanilpatel
Copy link
Contributor Author

Apologies for too much of delay, Yes next step is to break this PR in smaller PR. I will plan work for it.

@dblock
Copy link
Member

dblock commented Mar 3, 2022

@dhwanilpatel we still would like this feature!

@dblock
Copy link
Member

dblock commented Mar 7, 2022

Looks like this won't get worked on for a while, closing. See #479 if anyone wants to pick this up again.

@dblock dblock closed this Mar 7, 2022
@dhwanilpatel
Copy link
Contributor Author

Yes @dblock, this will be good feature, it will help in making master more resilience against floods of pending tasks. Unfortunately, I wont have time to work on it in near future. I will update over here whenever I am back on this.

gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request May 17, 2022
Rebased with master

Previous work : opensearch-project#553
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request May 26, 2022
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
ritty27 pushed a commit to ritty27/OpenSearch that referenced this pull request May 12, 2024
…OS 2.2.0 (opensearch-project#555)

* Fixes opensearch-project#553: fix highlight max_analyzer_offset field name to match with the one introduced in OpenSearch 2.2.0

Signed-off-by: Joao Schmitt <schmittjoaopedro@gmail.com>

* fix checkstyle errors

Signed-off-by: Joao Schmitt <schmittjoaopedro@gmail.com>

---------

Signed-off-by: Joao Schmitt <schmittjoaopedro@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request stalled Issues that have stalled WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster Manager Task Throttling