Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step Metadata Update on Index Rollover Timeout #1174

Merged

Conversation

harshitakaushik-dev
Copy link
Contributor

@harshitakaushik-dev harshitakaushik-dev commented May 28, 2024

Issue #1132 :

Issue Description:

This pull request addresses a bug in the Index Management plugin where the step metadata fields within IndexActionMetadata are not updated when a rollover action times out. Currently, only the info.message field is updated to indicate a timeout, which can be misleading if the previous run did not meet the rollover conditions.

Expected Behaviour:

Upon a timeout during the rollover action, the step.status and step.name fields within IndexActionMetadata should be updated to reflect the timeout event. This provides a clearer picture of the action's execution and avoids confusion regarding successful completion.

CheckList:

  • Commits are signed per the DCO using --signoff

Proposed Changes:

  • The code has been modified to handle timeouts appropriately.
    • The logic checks if an action has timed out using the hasTimedOut function.
    • If a timeout occurs, a new IndexActionMetadata object is created with the following updates:
      • step.status is set to Step.StepStatus.TIMED_OUT.
      • step.name retains its original value (e.g., "attempt_rollover").
      • info.message is set to "Action timed out".
  • The updated metadata is then persisted using the updateManagedIndexMetaData function.

Testing:

  • Unit tests have been updated in the ActionTimeoutIT class to verify that the step metadata is correctly populated with the TIMED_OUT status upon a timeout.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kaushik <harycash@amazon.com>
@harshitakaushik-dev harshitakaushik-dev changed the title eifjcbgbrrjgcgdjtfejkdifkbgibrrergikvvckfbveStep Metadata Update on Index Rollover Timeout Step Metadata Update on Index Rollover Timeout May 28, 2024
.copy(actionMetaData = currentActionMetaData?.copy(failed = true), info = info),
)

val updatedMetaData = managedIndexMetaData.copy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can rename this to updatedIndexMetaData

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've made the change and updated the PR.

@sarthakaggarwal97
Copy link
Contributor

thanks @harshitakaushik-dev for raising this!

@@ -56,6 +56,7 @@ abstract class Step(val name: String, val isSafeToDisableOn: Boolean = true) {
CONDITION_NOT_MET("condition_not_met"),
FAILED("failed"),
COMPLETED("completed"),
TIMED_OUT("timed_out"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a new field could cause problem when cluster having old and new version of code during upgrade. Maybe just use the "failed" state is good enough.

Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @bowenlan-amzn for taking a look. Not sure if I fully understand this. What could be that scenario where adding a new step state can fail. I'm asking in case we need to add a new step state in the future.

PS: I'm okay with failed as well for this case. And in the failure message, we would have timed out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my thinking: this new field of Step status is saved in the metadata document of ISM system index. When user do a explain API using the old node with old software, it cannot understand this new field value so probably fail.

So the impact is not big, only during the upgrading when cluster is mixed with old and new software.

I just found this potential problem is already safened, new software won't be used if old software exists in cluster

if (skipExecFlag.flag) {
logger.info("Cluster still has nodes running old version ISM plugin, skip execution on new nodes until all nodes upgraded")
return@launch
}

So no problem now 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome @bowenlan-amzn
thanks for checking

Copy link

codecov bot commented May 28, 2024

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 66.29%. Comparing base (e5a13ff) to head (ce33ce5).
Report is 4 commits behind head on main.

Current head ce33ce5 differs from pull request most recent head a8d7379

Please upload reports for the commit a8d7379 to get more accurate results.

Files Patch % Lines
...agement/indexstatemanagement/ManagedIndexRunner.kt 0.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1174      +/-   ##
============================================
- Coverage     75.34%   66.29%   -9.05%     
+ Complexity     2812     2406     -406     
============================================
  Files           367      367              
  Lines         17038    17039       +1     
  Branches       2370     2372       +2     
============================================
- Hits          12837    11296    -1541     
- Misses         2901     4596    +1695     
+ Partials       1300     1147     -153     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: harycash <harycash@amazon.com>
@sarthakaggarwal97
Copy link
Contributor

@bowenlan-amzn would you please do the final reviews so we can merge it. Thank you!

@bowenlan-amzn
Copy link
Member

Thanks for the contribution! @harshitakaushik-dev

@bowenlan-amzn bowenlan-amzn merged commit d4ee795 into opensearch-project:main May 30, 2024
29 of 30 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/index-management/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/index-management/backport-2.x
# Create a new branch
git switch --create backport/backport-1174-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d4ee795e22f4490b78662f171f62d566a81c1abc
# Push it to GitHub
git push --set-upstream origin backport/backport-1174-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/index-management/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1174-to-2.x.

@bowenlan-amzn
Copy link
Member

@harshitakaushik-dev the backport failed, could you backport manually to 2.x

harshitakaushik-dev added a commit to harshitakaushik-dev/index-management that referenced this pull request May 30, 2024
Co-authored-by: Harshita Kaushik <harycash@amazon.com>
harshitakaushik-dev added a commit to harshitakaushik-dev/index-management that referenced this pull request May 30, 2024
Co-authored-by: Harshita Kaushik <harycash@amazon.com>
Signed-off-by: harycash <harycash@amazon.com>
bowenlan-amzn pushed a commit that referenced this pull request May 30, 2024
…#1177)

Co-authored-by: Harshita Kaushik <harycash@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants