Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Promtail stage for probabilistic sampling #6654

Closed
tpaschalis opened this issue Jul 11, 2022 · 0 comments · Fixed by #7127
Closed

Add a Promtail stage for probabilistic sampling #6654

tpaschalis opened this issue Jul 11, 2022 · 0 comments · Fixed by #7127
Labels
component/promtail keepalive An issue or PR that will be kept alive and never marked as stale.

Comments

@tpaschalis
Copy link
Member

tpaschalis commented Jul 11, 2022

Is your feature request related to a problem? Please describe.
Sometimes we don't have direct access to what or how often an application logs, especially for third-party dependencies. Furthermore, I (personally) will primarily dig into logs when there’s something going wrong. 😅

For this reason, users may find it useful to be able to limit the ingestion rate of certain types of logs, both for lower costs and signal-to-noise reduction. Like we do for traces, probabilistic sampling is a good option here, and Loki’s label-based approach is also a natural fit. For example, users could sample to keep 10% of ‘trace’ level logs, or 20% of logs coming from an 200 - OK response, but always keep logs containing bad words like ‘fatal’ or ‘error’.

One thing to note here, is that ideally we'd try to preserve this information (using a new label?), so that the user can understand that it’s not that "200 - OK" responses have dropped by 80%, it’s just that they’re only logging a subset of them.

Describe the solution you'd like
Add a sampling stage to Promtail. Similar to the drop stage, it would accept a regex, a source, and a float between [0, 1].

Then, it would check the log line contents or the given source and probabilistically drop or keep the log line.

Describe alternatives you've considered
Other than instrumenting the application emitting the logs itself, a similar effect could be obtained by a complex pipeline containing the timestamp and a regex on it (eg. to only ingest lines whose for timestamps where 'second' is between [0, 10]).

A similar effect is also obtained by using the limit stage, although it is a hardcoded limit, and not a probabilistic one.

Additional context
A similar proposal was opened a long time ago, but failed to get any traction.

@jeschkies jeschkies added the keepalive An issue or PR that will be kept alive and never marked as stale. label Jul 11, 2022
MasslessParticle pushed a commit that referenced this issue Mar 17, 2023
…ng (#7127)

<!--  Thanks for sending a pull request!  Before submitting:

1. Read our CONTRIBUTING.md guide
2. Name your PR as `<Feature Area>: Describe your change`.
a. Do not end the title with punctuation. It will be added in the
changelog.
b. Start with an imperative verb. Example: Fix the latency between
System A and System B.
  c. Use sentence case, not title case.
d. Use a complete phrase or sentence. The PR title will appear in a
changelog, so help other people understand what your change will be.
3. Rebase your PR if it gets out of sync with main
-->

**What this PR does / why we need it**:

The sampling stage can be directly sampled.
The implementation of sampling is to use the algorithm in jaeger go
client
```
pipeline_stages:
- sampling:
     rate: 0.1
```
or it can be used with match for precise sampling.
```
pipeline_stages:
- json:
    expressions:
      app:
- match:
    pipeline_name: "app2"
    selector: "{app=\"poki\"}"
    stages:
    - sampling:
        rate: 0.1
```

**Which issue(s) this PR fixes**:
Fixes #6654

**Special notes for your reviewer**:

The promtail 'rate' stage is also used with the 'match' stage for log
filtering.This design makes the code very clean.
Other log agents vector have log filtering built into the sampling
operator, which I think is too complicated
https://vector.dev/docs/reference/configuration/transforms/sample/
```
 transforms:
  my_transform_id:
    type: sample
    inputs:
      - my-source-or-transform-id
    exclude: null
    rate: 10
```

'rate' stage review suggestions .
#5051

![image](https://user-images.githubusercontent.com/9583245/189461481-6ee4d835-2573-4b8e-8dec-2814620d758a.png)

<!--
Note about CHANGELOG entries, if a change adds:
* an important feature
* fixes an issue present in a previous release, 
* causes a change in operation that would be useful for an operator of
Loki to know
then please add a CHANGELOG entry.

For documentation changes, build changes, simple fixes etc please skip
this step. We are attempting to curate a changelog of the most relevant
and important changes to be easier to ingest by end users of Loki.

Note about the upgrade guide, if this changes:
* default configuration values
* metric names or label names
* changes existing log lines such as the metrics.go query output line
* configuration parameters 
* anything to do with any API
* any other change that would require special attention or extra steps
to upgrade
Please document clearly what changed AND what needs to be done in the
upgrade guide.
-->
**Checklist**
- [x] Documentation added
- [x] Tests updated
- [ ] Is this an important fix or new feature? Add an entry in the
`CHANGELOG.md`.
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`

---------

Co-authored-by: J Stickler <julie.stickler@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/promtail keepalive An issue or PR that will be kept alive and never marked as stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants