Skip to content

Commit

Permalink
vm-monitor: Switch from memory.high to polling memory.stat (#5524)
Browse files Browse the repository at this point in the history
tl;dr it's really hard to avoid throttling from memory.high, and it
counts tmpfs & page cache usage, so it's also hard to make sense of.

In the interest of fixing things quickly with something that should be
*good enough*, this PR switches to instead periodically fetch memory
statistics from the cgroup's memory.stat and use that data to determine
if and when we should upscale.

This PR fixes #5444, which has a lot more detail on the difficulties
we've hit with memory.high. This PR also supersedes #5488.
  • Loading branch information
sharnoff committed Oct 17, 2023
1 parent 543b815 commit 9fe5cc6
Show file tree
Hide file tree
Showing 4 changed files with 356 additions and 728 deletions.
4 changes: 2 additions & 2 deletions libs/vm_monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ and old one if it exists.
* the filecache: a struct that allows communication with the Postgres file cache.
On startup, we connect to the filecache and hold on to the connection for the
entire monitor lifetime.
* the cgroup watcher: the `CgroupWatcher` manages the `neon-postgres` cgroup by
listening for `memory.high` events and setting its `memory.{high,max}` values.
* the cgroup watcher: the `CgroupWatcher` polls the `neon-postgres` cgroup's memory
usage and sends rolling aggregates to the runner.
* the runner: the runner marries the filecache and cgroup watcher together,
communicating with the agent throught the `Dispatcher`, and then calling filecache
and cgroup watcher functions as needed to upscale and downscale
Loading

1 comment on commit 9fe5cc6

@github-actions
Copy link

@github-actions github-actions bot commented on 9fe5cc6 Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2372 tests run: 2256 passed, 0 failed, 116 skipped (full report)


Flaky tests (5)

Postgres 16

  • test_crafted_wal_end[last_wal_record_crossing_segment]: release
  • test_crafted_wal_end[last_wal_record_xlog_switch_ends_on_page_boundary]: debug
  • test_tenant_config: debug

Postgres 15

  • test_crafted_wal_end[last_wal_record_crossing_segment]: release

Postgres 14

Code coverage (full report)

  • functions: 52.9% (8292 of 15672 functions)
  • lines: 80.6% (48349 of 59987 lines)

The comment gets automatically updated with the latest test results
9fe5cc6 at 2023-10-17T23:51:08.644Z :recycle:

Please sign in to comment.