Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix k8s task runner failure reporting #15311

Merged
merged 4 commits into from
Nov 4, 2023

Conversation

georgew5656
Copy link
Contributor

Fixes a couple bugs with some of the AbstractTask.run implementation that was added to support mmless ingestion..

Description

If a task in mmless ingestion fails during setup(), then cleanup is called with taskStatus=TaskStatus.running(getId()) because of a error in the logic. This can cause issues because then the overlord will think the task is running when in reality it has exited.

Separately, as a part of dc0b163, mmless ingestion will normally report the status to the overlord in the UpdateStatusAction api call, with the deep storage status.json being just a backup option. When this status is reported, the error message is dropped because UpdateStateAction only takes in a string instead of a TaskStatus object. To fix this, I added a TaskStatus field to the UpdateStatusAction (i left the string value in for backwards compatibility).

Release note

Fix some error reporting with mmless ingestion

Key changed/added classes in this PR
  • AbstractTask
  • UpdateStatusAction

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@arunramani arunramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great to me, just had that little comment.

@@ -186,7 +187,7 @@ public final TaskStatus run(TaskToolbox taskToolbox) throws Exception
public abstract TaskStatus runTask(TaskToolbox taskToolbox) throws Exception;

@Override
public void cleanUp(TaskToolbox toolbox, TaskStatus taskStatus) throws Exception
public void cleanUp(TaskToolbox toolbox, @Nullable TaskStatus taskStatus) throws Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although no harm to annotate nullable here, I think taskStatus won't be possible null as it will be set to a value in all the code path.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - feel free to merge. I think the small change in toString will help in debugging so we don't see something like status=statusFull={...}

The rest of the comments are around making it easier to remove the status field from the UpdateStatusAction in the future - up to you if you think it's worth addressing.

You can also merge this patch, and address anything else in a future patch so you don't need to wait on CI again.

)
{
this.status = status;
this.statusFull = statusFull;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If status is deprecated, and callers are expected to use statusFull can we add validation in the constructor to make sure only 1 field is set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather not add the extra validation since its principally a json constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine.

My thinking was that in the perform method either status or statusFull is used. So if a caller calls this method with both set, it's likely a mistake, and they won't know about it unless they closely inspect the results of the perform method.

@Test
public void testEquals()
{
UpdateStatusAction one = new UpdateStatusAction("", TaskStatus.failure(ID, "error"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EqualsVerifier.forClass(UpdateStatusAction.class).verify() will verify all combinations of the underlying fields in the class and point out subtle issues in the equals / hash code logic - see other tests like testEquals or testEqualsAndHashCode for examples.

@georgew5656 georgew5656 merged commit a8906b6 into apache:master Nov 4, 2023
82 checks passed
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
* Fix k8s task runner failure reporting

* Fix reference

* add jsonignore

* PR changes
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants