Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimising node to node communication by serializing node attribute in DiscoveryNode only in scenarioes where it is required #15341

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

RS146BIJAY
Copy link
Contributor

@RS146BIJAY RS146BIJAY commented Aug 21, 2024

Description

A significant amount of compute and memory goes into ser/de during node to node communications for DiscoveryNode containing a bunch of node properties and attributes which are largely static and doesn't need to passed around for most of the node to node communication. Further, in scenarios like NodeStats call or FollowerChecker requests, single master thread needs to broadcast this DiscoveryNode object containing all these attributes to all the nodes of cluster. In case cluster is very large, this becomes a major bottleneck for master transport thread (which handles other critical operation like ClusterStateUpdate, IndexCreate etc,), which remains blocked till DiscoveryNode object is written.

In this PR we propose to optimise this node to node communication by serializing node attributes in DiscoveryNode only in scenarioes where it is required.

We are serialising attributes in the following scenarioes:

  1. Cluster state publication
  2. JoinRequest
  3. Cluster Allocation Explanation request
  4. Start Recovery
  5. HandShake Request

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for b2d6402: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 56ce8d0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f7ea283: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 157d6dc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 0603f45: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@RS146BIJAY RS146BIJAY changed the title Serializing node attribute in discoveryNode only in scenarioes where it is required Optimising node to node communication by serializing node attribute in discoveryNode only in scenarioes where it is required Aug 25, 2024
@RS146BIJAY RS146BIJAY changed the title Optimising node to node communication by serializing node attribute in discoveryNode only in scenarioes where it is required Optimising node to node communication by serializing node attribute in DiscoveryNode only in scenarioes where it is required Aug 25, 2024
Copy link
Contributor

❌ Gradle check result for 13054e6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b7e9ae8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c0b85f4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@RS146BIJAY RS146BIJAY force-pushed the attrib-test branch 2 times, most recently from 74d3660 to 6c0e2c4 Compare August 29, 2024 06:41
Copy link
Contributor

❌ Gradle check result for 74d3660: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6c0e2c4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@RS146BIJAY RS146BIJAY force-pushed the attrib-test branch 2 times, most recently from 6b0ec1a to b4c4a57 Compare August 30, 2024 06:11
Copy link
Contributor

❌ Gradle check result for 6b0ec1a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b4c4a57: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b4c4a57: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 1, 2024

❕ Gradle check result for c2a85c3: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we verify that REST API call responses don't break?

Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for f87e338: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for da7f5a5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…it is required

Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
Copy link
Contributor

github-actions bot commented Sep 3, 2024

❕ Gradle check result for ca25f58: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@Bukhtawar Bukhtawar merged commit 4516065 into opensearch-project:main Sep 3, 2024
35 checks passed
@Bukhtawar Bukhtawar added the backport 2.x Backport to 2.x branch label Sep 3, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15341-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 451606535752a73be80d5203ae417e7d57fc5cef
# Push it to GitHub
git push --set-upstream origin backport/backport-15341-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15341-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants