Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Add new tests for Drain pattern matching #12655

Merged
merged 5 commits into from
Apr 19, 2024

Conversation

benclive
Copy link
Contributor

What this PR does / why we need it:

  • Adding new tests for Drain pattern matching
  • We currently don't have any tests for this so these also act as examples and a starting point in order to begin improving the Drain pattern discovery.
  • These tests aren't very exhaustive yet but already highlight some potential improvements such as trying to generate more specific patterns in some cases and less specific in others.

@CLAassistant
Copy link

CLAassistant commented Apr 17, 2024

CLA assistant check
All committers have signed the CLA.

@benclive benclive force-pushed the add-tests-for-drain-patterns branch from 3d57d66 to 02faf08 Compare April 17, 2024 13:39
patterns: []string{
"<_> capacity <_>",
"<_> capacity changes <_>",
"{\"id\":\"D4Oh1ivB6cdLWa08\",\"level\":\"debug\",\"max-pool\":4,\"min-pool\":0,\"msg\":\"check capacity\",\"pending-builds\":0,\"running-builds\":0,\"server-buffer\":0,\"server-capacity\":0,\"server-count\":0,\"time\":\"2024-04-16T14:48:52Z\"}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use string with ` to avoid the pain of escaping.

Comment on lines 12 to 33
// At training, if at the depth of LogClusterDepth there is a cluster with
// similarity coefficient greater that SimTh, then the log message is added
// to that cluster. Otherwise, a new cluster is created.
//
// LogClusterDepth should be equal to the number of constant tokens from
// the beginning of the message that likely determine the message contents.
//
// > In this step, Drain traverses from a 1-st layer node, which
// > is searched in step 2, to a leaf node. This step is based on
// > the assumption that tokens in the beginning positions of a log
// > message are more likely to be constants. Specifically, Drain
// > selects the next internal node by the tokens in the beginning
// > positions of the log message
LogClusterDepth: 8,
// SimTh is basically a ratio of matching/total in the cluster.
// Cluster tokens: "foo <*> bar fred"
// Log line: "foo bar baz qux"
// * * * x
// Similarity of these sequences is 0.75 (the distance)
// Both SimTh and MaxClusterDepth impact branching factor: the greater
// MaxClusterDepth and SimTh, the less the chance that there will be
// "similar" clusters, but the greater the footprint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool thank you for that !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't take credit for this! I copied the descriptions and parameters over from

var drainConfig = &drain.Config{
in order to get started.
But now I think it makes more sense to unify it under the DefaultConfig function inside the drain package, at least until it becomes clear that we'd need different configs for different sets of logs.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, happy to merge as is to incrementally move forward.

I think we could add more and more format as we go. Specially popular ones.

@benclive benclive marked this pull request as ready for review April 19, 2024 09:24
@benclive benclive requested a review from a team as a code owner April 19, 2024 09:24
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena enabled auto-merge (squash) April 19, 2024 09:30
@cyriltovena cyriltovena merged commit 5ae33d1 into main Apr 19, 2024
14 checks passed
@cyriltovena cyriltovena deleted the add-tests-for-drain-patterns branch April 19, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants