Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Stats transport actions based on TransportNodeActions sends large payload of Discovery Nodes to all nodes #14713

Closed
Pranshu-S opened this issue Jul 11, 2024 · 0 comments · Fixed by #14749
Labels
bug Something isn't working Cluster Manager v2.16.0 Issues and PRs related to version 2.16.0

Comments

@Pranshu-S
Copy link
Contributor

Describe the bug

In the current implementation, every transport action extending TransportNodesAction includes all discovery nodes in the transport request sent to each node in the cluster. This approach leads to performance bottlenecks in large clusters due to redundant data transmission. Specifically:

  1. Increased Network Traffic: The same list of discovery nodes is written n^2 times (where n is the number of nodes), causing unnecessary network traffic and increased IO.
  2. Write/Read Latency: The excessive data transmission contributes to higher overall latency for both write and read operations.
  3. NIO Buffer Bottleneck: When using plugins like Netty for inter-node communication, the buffer becomes overloaded with redundant discovery node information, increasing the size of the request and correspondingly reducing the amount of requests which can fit in the Netty buffer.

image

Related component

Other

To Reproduce

If NodeIDs are passed in the TransportNodeAction requests, we resolve them into DiscoveryNodes. This request is cloned by the individual requests which go to each node here which ends up write the discoveryNodes object.

Essentially for a 200 Node cluster, we are sending writing 200 discoveryNode objects for each request -> implying we write about 200x200 in the entire duration of the send path. This grows exponentially with number of nodes

Expected behavior

The request path should only be sending information that is to be required on the receive path.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager v2.16.0 Issues and PRs related to version 2.16.0
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants