Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics-generator: add override to disable metrics collection #1375

Conversation

kvrhdn
Copy link
Member

@kvrhdn kvrhdn commented Apr 12, 2022

What this PR does:
Add an override metrics_generator_disable_collection which tells the registry to not collect and write samples for this tenant. This will break the link between the registry and the storage layer (which is responsible for remote writing metrics).

The metrics-generator will still generate metrics and keep track of all the counters and histograms. If you don't want to generate metrics at all, you should clear metrics_generator_processors for this tenant.

This setting is useful if you want to ingest data into the metrics-generator, observe resource usage and active series, but don't want to write real metrics. This should be especially handy if you are concerned about the generated active series overwhelming a downstream TSDB.

Which issue(s) this PR fixes:
Related to #1303

Checklist

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Left a comment regarding the approach.

@@ -60,6 +60,7 @@ type Limits struct {
MetricsGeneratorProcessors ListToMap `yaml:"metrics_generator_processors" json:"metrics_generator_processors"`
MetricsGeneratorMaxActiveSeries uint32 `yaml:"metrics_generator_max_active_series" json:"metrics_generator_max_active_series"`
MetricsGeneratorCollectionInterval time.Duration `yaml:"metrics_generator_collection_interval" json:"metrics_generator_collection_interval"`
MetricsGeneratorDisableCollection bool `yaml:"metrics_generator_disable_collection" json:"metrics_generator_disable_collection"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking, maybe we can use MetricsGeneratorCollectionInterval == 0 instead? We set a default value for the interval and not run collection if it's disabled.

Now we can have MetricsGeneratorCollectionInterval: 0 and MetricsGeneratorDisableCollection: true and not collect metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be concerned about confusing 0 with 'not set', though I think this should work:

MetricsGeneratorCollectionInterval *time.Duration `yaml:"metrics_generator_collection_interval,omitempty" json:"metrics_generator_collection_interval,omitempty"`

I'll try some stuff out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like combining this functionality into the existing override with metrics_generator_collection_interval: 0 but it's a bit tricky/annoying to implement.
We use collection interval both for collecting metrics and for checking the overrides:

go job(instanceCtx, r.collectMetrics, r.collectionInterval)

// job executes f every getInterval.
func job(ctx context.Context, f func(context.Context), getInterval func() time.Duration) {
currentInterval := getInterval()
ticker := time.NewTicker(currentInterval)
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
jobCtx, cancel := context.WithTimeout(ctx, currentInterval)
f(jobCtx)
cancel()
}
newInterval := getInterval()
if currentInterval != newInterval {
ticker = time.NewTicker(newInterval)
currentInterval = newInterval
}
}
}

So if collection interval is 0, we can't just kill this job as we have to keep checking the overrides as well. Otherwise we won't detect when collection interval changes again.

So for simplicity I suggest just adding the new override and we can later refactor this if we find a better solution.

@kvrhdn kvrhdn enabled auto-merge (squash) April 14, 2022 17:30
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR generally looks fine to me, but I have some questions:

  • Why do this in the metrics-generator and not the distributor?
  • The distributor already has a per tenant "switch" to send data to the metrics generator. How is this different?

modules/generator/registry/registry.go Show resolved Hide resolved
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just read your PR description which answers my questions. :)

@kvrhdn kvrhdn merged commit 083f714 into grafana:main Apr 15, 2022
@kvrhdn kvrhdn deleted the kvrhdn/metrics-generator-override-disable-collection branch April 21, 2022 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants