Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default values in CoordinatorDynamicConfig #14269

Merged
merged 4 commits into from
May 30, 2023

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented May 12, 2023

Changes

The defaults of the following config values in the CoordinatorDynamicConfig are being updated.

1. maxSegmentsInNodeLoadingQueue = 500 (previous = 100)

Rationale: With round-robin segment assignment now being the default assignment technique, the Coordinator can assign a large number of under-replicated/unavailable segments very quickly. Before round-robin, a large queue size would cause the Coordinato to get stuck in RunRules duty due to very slow strategy-based cost computations.

2. replicationThrottleLimit = 500 (previous = 10)

Rationale: Along with the reasoning given for maxSegmentsInNodeLoadingQueue, a very low replicationThrottleLimit can cause clusters to be very slow in getting to full replication, even when there are loading threads sitting idle.

Note: It is okay to keep this value equal to maxSegmentsInNodeLoadingQueue. Even with equal values, load queues will not get filled up with just replicas, and segments that are completely unavailable will still get a fair chance. This is because while MSINLQ applies to a single server, replicationThrottleLimit applies to each tier.

3. maxSegmentsToMove = 100 (previous = 5)

Rationale: A very low value of this config (say 5) turns out to be very ineffective in balancing especially if there are a large number of segments in a cluster and/or a large skew between usages of two historical servers.
On the other hand, a very large value can cause excessive moves every minute, which might have the following disadvantages:

  • Load of moving segments competing with load of unavailable/under-replicated segments
  • Unnecessary network costs due to constant download and delete of segments

These defaults will be revisited after #13197 is merged.

Testing

These values have been tried on different production cluster sizes, and have been found to give satisfactory results.

Release note

Update default values of the following coordinator dynamic configs:

  • maxSegmentsInNodeLoadingQueue = 500
  • maxSegmentsToMove = 100
  • replicationThrottleLimit = 500

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula
Copy link
Contributor

Changes may be needed in coordinator-dynamic-config.tsx as well.

|`balancerComputeThreads`|Thread pool size for computing moving cost of segments in segment balancing. Consider increasing this if you have a lot of segments and moving segments starts to get stuck.|1|
|`emitBalancingStats`|Boolean flag for whether or not we should emit balancing stats. This is an expensive operation.|false|
|`killDataSourceWhitelist`|List of specific data sources for which kill tasks are sent if property `druid.coordinator.kill.on` is true. This can be a list of comma-separated data source names or a JSON array.|none|
|`killPendingSegmentsSkipList`|List of data sources for which pendingSegments are _NOT_ cleaned up if property `druid.coordinator.kill.pendingSegments.on` is true. This can be a list of comma-separated data sources or a JSON array.|none|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 100. |100|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. |500|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. |500|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments allowed in a loading queue for any given server. Use this parameter to load the segments faster—for example, if the cluster contains slow-loading nodes, or if there are too many segments to be replicated to a particular node (when faster loading is preferred to better segments distribution). Desired value depends on the loading speed of segments, acceptable replication time, and number of nodes. Value 1000 is a good starting point for a big cluster. |500|

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some suggestions to improve readability.

@kfaraz
Copy link
Contributor Author

kfaraz commented May 19, 2023

Thanks for the review, @ektravel ! I have incorporated your feedback.

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from the docs standpoint.

@kfaraz kfaraz merged commit 8091c6a into apache:master May 30, 2023
@kfaraz kfaraz deleted the update_default_dynamic_config_values branch May 30, 2023 03:21
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants