Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Adds max signals warning to UI propagated rule warnings #154112

Merged
merged 21 commits into from
Apr 26, 2023

Conversation

dplumlee
Copy link
Contributor

@dplumlee dplumlee commented Mar 30, 2023

Summary

Addresses #120678

Adds a warning message that is propagated to the user when a rule execution hits max signals. This will also set the rule in a partial failure state

Screenshots

Screenshot 2023-04-25 at 10 39 46 AM

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@dplumlee dplumlee self-assigned this Mar 30, 2023
@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

2 similar comments
@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

@marshallmain
Copy link
Contributor

Also is there a warning for EQL rules implemented?

Comment on lines +127 to +128
} else if (maxAlerts === 0) {
return { createdAlerts: [], errors: {}, alertsWereTruncated: true };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short circuiting here if maxAlerts is passed in as zero. We're using this function's deduping ability to check that: if max signals was hit, are any remaining alerts actual alerts or are they duplicates? If there are remaining alerts in filteredAlerts at this point, we can assume they are actual alerts and don't need any additional function logic

@dplumlee dplumlee marked this pull request as ready for review April 25, 2023 13:40
@dplumlee dplumlee requested review from a team as code owners April 25, 2023 13:40
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response ops changes lgtm

@nastasha-solomon
Copy link
Contributor

nastasha-solomon commented Apr 25, 2023

@dplumlee thanks again for looping docs in on the UI copy review. A few suggestions and questions for you:

  • The first half of the warning message contains some language that users might be unfamiliar with (e.g., circuit breaker) and sounds a little aggressive (e.g., "was hit"). What do you think about the following revisions:
    • This rule has produced 100 alerts, which is the maximum limit for alerts. - This version tells users that the rule has only generated 100 alerts, and nothing more.
    • This rule has exceeded the maximum alert limit of 100 alerts. - This version tells users that the rule has created over 100 alerts.
  • Can the max number of alerts limit be modified at all? If it can, what are the steps to doing that?
  • What are missing alerts in this context? Are they alerts without documents? Events only? Can they be retrieved or tracked down if a rule has hit its max alert limit?

@dplumlee
Copy link
Contributor Author

@nastasha-solomon

We can definitely change the language to be something more familiar but I think we want to avoid the specific numbers as the message can occur through different methods for different rule types. Sometimes we do have that hard alert count, sometimes we don't which is why I kept it more vague. I think the overall goal is to get the user to look more into their rule configuration to see what might be causing them to get so many alerts.

Can the max number of alerts limit be modified at all? If it can, what are the steps to doing that?

Yes but only through the API as listed here under max_signals

What are missing alerts in this context? Are they alerts without documents? Events only? Can they be retrieved or tracked down if a rule has hit its max alert limit?

They are alerts that we have yet to analyze for a number of reasons but we see that they're there. We have this maximum count limit so we don't run a rule over endless alerts and bog down the system. They could be actual alerts, but often times, especially with certain rule types, we can't be certain they are. They could be alerts that match a value list exception that would be filtered out later, duplicate alerts that wouldn't be written in the end, etc. We just see them when searching but can't be sure if they're legitimate data hence the ambiguity of the statement. There's also not really a good immediate action we could recommend the user take as there are a number of reasons this could be happening to their rule.

@nastasha-solomon
Copy link
Contributor

nastasha-solomon commented Apr 25, 2023

Thanks for the additional information, @dplumlee. Here are some options that don't have a hard number in them:

  • This rule has reached the maximum alert limit. - This really only says that the rule has hit the threshold for max alerts. It doesn't tell users that more alerts might exist or that the rule's partial failure state might be tied to the rule hitting the max alert limit.
  • This rule has stopped producing alerts because the alert limit has been reached. - This is similar to the first message, but it definitely says that no more alerts are being generated. Note that it also doesn't say that more alerts might exist. If this is important information for users to know, the option below might be more suitable.
  • This rule has exceeded the maximum alert limit. - This says that the limit has been reached but that more alerts have been, or are being, created.

With all of these, I'd still recommend giving users some direction for resolving the error message and bringing the rule back from its partial failure state. It sounds like offering specific advice would be tricky since a number of things could be happening, but maybe we could recommend something that temporarily fixes the rule's state? Would it make sense to suggest that they reduce the number of open alerts or temporarily raise the max alert limit? For example:

This rule has exceeded the maximum alert limit. Consider closing duplicate alerts or temporarily increasing the max alert limit to ensure you're not missing new alerts.

Another option is to link to rule troubleshooting docs that suggest next steps while users are troubleshooting or fine-tuning the rule. For example:

This rule has exceeded the maximum alert limit. Refer to <the docs page> for troubleshooting help.

@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@marshallmain marshallmain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fix in indicator match rules, otherwise it looks great!

Comment on lines 170 to 174
if (results.warningMessages.includes(getMaxSignalsWarning())) {
results.warningMessages = uniq(results.warningMessages);
} else {
results.warningMessages.push(getMaxSignalsWarning());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (results.warningMessages.includes(getMaxSignalsWarning())) {
results.warningMessages = uniq(results.warningMessages);
} else {
results.warningMessages.push(getMaxSignalsWarning());
}
if (results.warningMessages.includes(getMaxSignalsWarning())) {
results.warningMessages = uniq(results.warningMessages);
} else if (documentCount > 0) {
results.warningMessages.push(getMaxSignalsWarning());
}

@dplumlee dplumlee enabled auto-merge (squash) April 25, 2023 23:03
@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

@dplumlee
Copy link
Contributor Author

@elasticmachine merge upstream

@dplumlee dplumlee merged commit cd180a0 into elastic:main Apr 26, 2023
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #1 / spaces api without security update can update from the space_1 space "before all" hook in "can update from the space_1 space"
  • [job] [logs] Security Solution Tests #3 / timeline flyout button the (+) button popover menu owns focus

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
enterpriseSearch 17 19 +2
securitySolution 399 402 +3
total +5

Total ESLint disabled count

id before after diff
enterpriseSearch 18 20 +2
securitySolution 479 482 +3
total +5

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @dplumlee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:enhancement Team:Detection Alerts Security Detection Alerts Area Team v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants