Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Task Manager] Emitting error metric when task update fails #191307

Merged
merged 3 commits into from
Aug 28, 2024

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Aug 26, 2024

Resolves #184173

Summary

Catches errors updating the task from the taskStore.bulkUpdate function and emitting an error count so these errors are reflected in the metrics.

To Verify

  1. Add the following to force an error when running an example rule:
--- a/x-pack/plugins/task_manager/server/task_store.ts
+++ b/x-pack/plugins/task_manager/server/task_store.ts
@@ -24,6 +24,7 @@ import {
   ISavedObjectsRepository,
   SavedObjectsUpdateResponse,
   ElasticsearchClient,
+  SavedObjectsErrorHelpers,
 } from '@kbn/core/server';

 import { RequestTimeoutsConfig } from './config';
@@ -309,6 +310,16 @@ export class TaskStore {
       this.logger.warn(`Skipping validation for bulk update because excludeLargeFields=true.`);
     }

+    const isProcessResult = docs.some(
+      (doc) =>
+        doc.taskType === 'alerting:example.always-firing' &&
+        doc.status === 'idle' &&
+        doc.retryAt === null
+    );
+    if (isProcessResult) {
+      throw SavedObjectsErrorHelpers.decorateEsUnavailableError(new Error('test'));
+    }
+
     const attributesByDocId = docs.reduce((attrsById, doc) => {
  1. Create an example.always-firing rule and let it run. You should see an error in the logs:
[2024-08-26T14:44:07.065-04:00][ERROR][plugins.taskManager] Task alerting:example.always-firing "80b8481d-7bfc-4d38-a31b-7a559fbe846b" failed: Error: test
  1. Navigate to https://localhost:5601/api/task_manager/metrics?reset=false and you should see a framework error underneath the overall metrics and the alerting metrics.

@ymao1 ymao1 self-assigned this Aug 26, 2024
@ymao1 ymao1 added release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0 labels Aug 26, 2024
@ymao1 ymao1 marked this pull request as ready for review August 26, 2024 21:14
@ymao1 ymao1 requested a review from a team as a code owner August 26, 2024 21:14
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1
Copy link
Contributor Author

ymao1 commented Aug 26, 2024

@elasticmachine merge upstream

Copy link
Contributor

@js-jankisalvi js-jankisalvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally, works as expected 👍

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! Tested locally and works as expected

@ymao1
Copy link
Contributor Author

ymao1 commented Aug 28, 2024

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit dafce90 into elastic:main Aug 28, 2024
37 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 28, 2024
@ymao1 ymao1 deleted the tm/184173 branch August 28, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors during processing task result are not shown in metrics
6 participants