Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #19102 to 7.x: Retryable downloads of beats #19156

Merged
merged 2 commits into from
Jun 12, 2020

Conversation

michalpristas
Copy link
Contributor

Cherry-pick of PR #19102 to 7.x branch. Original message:

What does this PR do?

Background:
when agent downloads an artifact and checksum does not match it yields a failure, but then it might occur that when download is performed again due to new config or whatever, download is skipped (because download was successful for some reason or packed artifacts are invalid).
Agent cleans up downloaded artifact only in case download yields error. so if this does not yield error but artifact is corrupted we might end up in a loop because it will try to verify artifact it find out it's incorrect and continues with failure... and so on

This PR changes this behavior a bit.

In case Verify fails. it cleans up downloaded artifacts (artifact + hash).

It also introduces retryable block within operation flow.
In this case we know than=t download+verify might be error prone so we can retry them if failure happens. (only if retry.enabled == true)

What this means for agent is that when it tries to install from corrupted artifact, it will remove artifact during Verify and re-download it again.

Why is it important?

Make download scneario more robust and repair loop faster

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test

  • Build a snapshot package
  • Modify one of sha files
  • enable retry by setting retry.enabled: true
  • run agent

See it fails with packed artifact, waits 30s and then downloads artifact from web

[Ingest Manager] Retryable downloads of beats (elastic#19102)
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 12, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 12, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19156 updated]

  • Start Time: 2020-06-12T13:32:33.341+0000

  • Duration: 33 min 31 sec

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

This should actually pass lint now that the GRPC switch has been backported to 7.x.

@michalpristas michalpristas merged commit eabb944 into elastic:7.x Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport bug Ingest Management:beta1 Group issues for ingest management beta1 review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants