Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAL backpressure #3218

Merged
merged 15 commits into from
Jan 27, 2021
Merged

WAL backpressure #3218

merged 15 commits into from
Jan 27, 2021

Conversation

owen-d
Copy link
Member

@owen-d owen-d commented Jan 22, 2021

What

This PR introduces a backpressure mechanic to the ingester during WAL replay. This is primarily to allow specifying an upper bound of memory devoted to the replay which will be drained to storage once hit.

This also includes

  • Defaults memchunk schema to v3 in order to take advantage of UncompressedBytes. @slim-bean graciously cut 2.0.1 as a rollback target that supports this schema in the case of unforeseen failures.
  • Adds a new WAL config, ingester.wal-replay-memory-ceiling
  • Adds two new metrics: loki_ingester_wal_recovered_bytes_total and loki_ingester_wal_bytes_in_use

Details

Note, this will cause a drop in the chunk dedupe ratio when it is hit, thus I suggest specifying the limit as high as is tolerable to minimize the chances. Generally, the backpressure drainage should only happen after outages when the WAL is large or after moving scaling down memory budgets.

@codecov-io
Copy link

codecov-io commented Jan 22, 2021

Codecov Report

Merging #3218 (9ef6ff4) into master (fd619b6) will increase coverage by 0.08%.
The diff coverage is 88.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3218      +/-   ##
==========================================
+ Coverage   63.22%   63.30%   +0.08%     
==========================================
  Files         196      197       +1     
  Lines       16671    16728      +57     
==========================================
+ Hits        10540    10590      +50     
- Misses       5174     5187      +13     
+ Partials      957      951       -6     
Impacted Files Coverage Δ
pkg/ingester/recovery.go 66.48% <70.37%> (+0.75%) ⬆️
pkg/ingester/stream.go 70.86% <77.77%> (+0.59%) ⬆️
pkg/ingester/flush.go 83.97% <91.66%> (+2.09%) ⬆️
pkg/chunkenc/memchunk.go 68.10% <100.00%> (ø)
pkg/ingester/ingester.go 49.24% <100.00%> (+0.62%) ⬆️
pkg/ingester/instance.go 60.28% <100.00%> (ø)
pkg/ingester/metrics.go 100.00% <100.00%> (ø)
pkg/ingester/replay_controller.go 100.00% <100.00%> (ø)
pkg/ingester/wal.go 81.60% <100.00%> (+0.43%) ⬆️
pkg/promtail/positions/positions.go 46.80% <0.00%> (-11.71%) ⬇️
... and 5 more

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'm a bit scared of sync.Cond I need to read more about those.

@owen-d owen-d merged commit 224e68a into grafana:master Jan 27, 2021
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Jun 11, 2021
…rafana#3218)

* Speed up the listing of rule groups on the Ruler Configuration API

Signed-off-by: gotjosh <josue@grafana.com>

* Update changelog based on feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants