Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus middleware improvements #627

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

stat1c-void
Copy link

A number of Prometheus middleware improvements:

  • canonical queue names in metrics
  • livesum for a gauge that was seemingly missing
  • unique PROMETHEUS_MULTIPROC_DIR per top-level process by default
  • made prometheus metrics test a bit more stable (several tries)

Have successfully run pytest --benchmark-skip on my local python 3.10.14.

Some details below.

Canonical queue names

If queue names aren't canonical, then something like this happens to metrics:

dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs"} -1
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs.DQ"} 1
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs"} -4
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs.DQ"} 4
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="default.DQ"} 2
dramatiq_delayed_messages_inprogress{..., pid="6", queue_name="default"} -1
dramatiq_delayed_messages_inprogress{..., pid="6", queue_name="default.DQ"} 2

Unique PROMETHEUS_MULTIPROC_DIR

Previously, Prometheus mutli-proc exporting would interfere between multiple Dramatiq instances on the same machine (because the prom DB path would be the same across different instances). Actually, would interfere even across restarts of the same Dramatiq instance (and even in containerized environments, when containers are just restarted, and not recreated).

In containerized env, top-level PID value is likely to be the same across runs, but with auto-cleanup it can be managed.

Metrics test

Noticed that test_prometheus_middleware_exposes_metrics was sometimes failing due to timings - metrics endpoint wasn't ready in 1 second. Should be a bit more stable now.

@Bogdanp
Copy link
Owner

Bogdanp commented May 30, 2024

I’m traveling right now and won’t be able to do a full review for a few weeks, but the casing of the environment variables should remain unchanged, otherwise this would be a breaking change.

@stat1c-void
Copy link
Author

Do you think it would be acceptable to use uppercase DRAMATIQ_PROM_DB and fallback to dramatiq_prom_db if uppercase one is missing?

And what's your opinion on PROMETHEUS_MULTIPROC_DIR? I've looked up prom python client code - lowercase variant is deprecated for 3 years now.
https://github.com/prometheus/client_python/blob/09a5ae30602a7a81f6174dae4ba08b93ee7feed2/prometheus_client/multiprocess.py#L27

And I'll try to look at gevent tests. Slightly off-topic, but is tox config up-to-date? I've been having trouble trying to run tox - errors out on tests with something about "conflicting factors".

@Bogdanp
Copy link
Owner

Bogdanp commented May 31, 2024

Do you think it would be acceptable to use uppercase DRAMATIQ_PROM_DB and fallback to dramatiq_prom_db if uppercase one is missing?

And what's your opinion on PROMETHEUS_MULTIPROC_DIR? I've looked up prom python client code - lowercase variant is deprecated for 3 years now. https://github.com/prometheus/client_python/blob/09a5ae30602a7a81f6174dae4ba08b93ee7feed2/prometheus_client/multiprocess.py#L27

And I'll try to look at gevent tests. Slightly off-topic, but is tox config up-to-date? I've been having trouble trying to run tox - errors out on tests with something about "conflicting factors".

Let’s use fallbacks for both environment vars. The tox config is probably out of date at the moment since it’s no longer used in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants