Add a /metrics endpoint for Prometheus Metrics #3490

yuvipanda · 2018-04-02T18:36:33Z

Prometheus provides a standard
metrics format that can be collected and used in many contexts.

From the browser
to drive 'current resource usage' displays, such
as https://github.com/yuvipanda/nbresuse
From a prometheus server
to collect historical data for operational analysis and
performance monitoring
Example: https://grafana.mybinder.org/dashboard/db/1-overview?refresh=1m&orgId=1
for mybinder.org metrics from JupyterHub and BinderHub,
via prometheus server at https://prometheus.mybinder.org

The JupyterHub and BinderHub projects already expose Prometheus
metrics natively. Adding this to the Jupyter notebook server
allows us to instrument the code easily and in
a standard format that has lots of 3rd party tooling for it.

This commit does the following:

Introduce the prometheus_client library as a dependency.
This library has no dependencies of its own and is pure python.
Add an authenticated /metrics endpoint to the server,
which returns metrics in Prometheus Text Format
Expose the default process metrics from prometheus_client,
which include memory usage and CPU usage info (for just the
notebook process)
Expose per-handler HTTP metrics using the RED method

yuvipanda · 2018-04-02T18:39:58Z

/cc @minrk @choldgraf @betatim @willingc who have come to like the Grafana / Prometheus integration we have in JupyterHub / BinderHub.

/cc @rgbkrk @ivanov who had conversations about exposing 'resource use metrics' as part of the kernel (IIRC). This PR is orthogonal to that, since it only deals with operational and performance metrics, rather than things like 'here is what is happening to your spark cluster!'

[Prometheus](https://prometheus.io/) provides a standard metrics format that can be collected and used in many contexts. - From the browser to drive 'current resource usage' displays, such as https://github.com/yuvipanda/nbresuse - From a prometheus server to collect historical data for operational analysis and performance monitoring Example: https://grafana.mybinder.org/dashboard/db/1-overview?refresh=1m&orgId=1 for mybinder.org metrics from JupyterHub and BinderHub, via prometheus server at https://prometheus.mybinder.org The JupyterHub and BinderHub projects already expose Prometheus metrics natively. Adding this to the Jupyter notebook server allows us to instrument the code easily and in a standard format that has lots of 3rd party tooling for it. This commit does the following: - Introduce the `prometheus_client` library as a dependency. This library has no dependencies of its own and is pure python. - Add an authenticated `/metrics` endpoint to the server, which returns metrics in Prometheus Text Format - Expose the default process metrics from `prometheus_client`, which include memory usage and CPU usage info (for just the notebook process)

yuvipanda · 2018-04-02T18:46:01Z

The appveyor failure seems unrelated?

Code adapted from JupyterHub

yuvipanda · 2018-04-02T19:08:18Z

I stole the code for implementing RED HTTP metrics from JupyterHub and added them here. With this, I can answer questions like 'how many times was the Tree handler called and what is the 90th percentile of response time for it?'

Works in JupyterHub because python3, fails python2 test here.

rgbkrk · 2018-04-02T19:22:30Z

Is it standard practice to put it at /metrics instead of some /api endpoint?

yuvipanda · 2018-04-02T19:30:34Z

Yep, /metrics is the standard endpoint. I eventually want to have an option to make /metrics unauthenticated too (as an explicit opt in). Since /api/ is all JSON, I feel ok keeping /metrics out of it. We could also expose a JSON endpoint under /api/ later if people wish.

…

On Mon, Apr 2, 2018 at 12:22 PM, Kyle Kelley ***@***.***> wrote: Is it standard practice to put it at /metrics instead of some /api endpoint? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3490 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAB23jTaLFZPkmPW5oaVT-0A8DkZlzRzks5tknp9gaJpZM4TD5iZ> .

-- Yuvi Panda T http://yuvi.in/blog

takluyver · 2018-04-03T08:39:56Z

notebook/metrics.py

+        method=handler.request.method,
+        handler='{}.{}'.format(handler.__class__.__module__, type(handler).__name__),
+        code=handler.get_status()
+    ).observe(handler.request.request_time())


I assume this is low overhead, since it's being called on every request?

Yeah, quite. It's just incrementing a local counter based on a few strings and a number:

In [12]: %timeit prometheus_log_method(handler) 5.88 µs ± 87.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The only network activity occurs when a prometheus server retrieves the metrics via the /metrics handler.

takluyver · 2018-04-03T08:42:04Z

This seems reasonable at a quick look.

How much information does it store? Is there any risk that if there's nowhere to hand the data off to, the memory use could continually grow as long as the server is left running?

minrk · 2018-04-03T09:32:34Z

notebook/metrics.py

+conventions for metrics & labels. We generally prefer naming them
+`<noun>_<verb>_<type_suffix>`. So a histogram that's tracking
+the duration (in seconds) of servers spawning would be called
+SERVER_SPAWN_DURATION_SECONDS.


copy/paste. REQUEST_DURATION_SECONDS

As an FYI, this particular example breaks the naming rule in the docstring.

Perhaps better to remove the preference sentence with noun/verb/type.

Consider renaming to NOTEBOOK_REQUEST_DURATION_SECONDS based on Prometheus docs.

Actually, it's not clear to me what a 'request duration' is - is that the time from the request being sent to it being received? The time from receiving the first byte to receiving the last? The time from receiving the request to sending the response?

If this is a standard term in web metrics, it doesn't matter that it's not familiar to me. But if it's a term we're creating, maybe we can create something less ambiguous.

Ping @yuvipanda

Heya!

I removed the naming convention recommendation, and just directly linked only to the page instead. This should hopefully reduce confusion.

I've also renamed this metric to http_request_duration_seconds. I think that is pretty standard for what we are doing here, which is indiscriminately recording metric info for all http requests. Operators usually use job and instance labels automatically added by prometheus to differentiate various applications & instances of applications. So I think in this case, it's ok to not use a prefix.

takluyver · 2018-06-10T17:19:12Z

I think the only bits people wanted changed here are in the docstring, and as an example of the naming I think it makes sense already (albeit that we don't actually use the example name here). So shall we merge this?

yuvipanda · 2018-06-11T19:47:42Z

@takluyver I've responded! Thank y'all for your patience :)

willingc

Thanks @yuvipanda

yuvipanda force-pushed the prometheus-intro branch from 3f6cf36 to a764f90 Compare April 2, 2018 18:40

Log HTTP request codes & timings to Prometheus

8aa22d6

Code adapted from JupyterHub

Use regular dashes, not em (or en?) dashes

7bad762

Works in JupyterHub because python3, fails python2 test here.

rgbkrk approved these changes Apr 2, 2018

View reviewed changes

takluyver reviewed Apr 3, 2018

View reviewed changes

minrk reviewed Apr 3, 2018

View reviewed changes

yuvipanda added 2 commits June 11, 2018 12:30

Fixup docstrings

de13203

Rename prometheus metrics to be more standard

7a3c0f3

rgbkrk approved these changes Jun 11, 2018

View reviewed changes

willingc approved these changes Jun 11, 2018

View reviewed changes

minrk merged commit 1918856 into jupyter:master Jun 13, 2018

minrk added this to the 5.6 milestone Jun 13, 2018

yuvipanda deleted the prometheus-intro branch June 13, 2018 18:10

yuvipanda mentioned this pull request Jun 13, 2018

Add more Prometheus metrics #3682

Open

yuvipanda mentioned this pull request May 21, 2019

JupyterLab telemetry extension jupyterlab/frontends-team-compass#4

Open

github-actions bot added the status:resolved-locked label Apr 1, 2021

github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a /metrics endpoint for Prometheus Metrics #3490

Add a /metrics endpoint for Prometheus Metrics #3490

yuvipanda commented Apr 2, 2018 •

edited

Loading

yuvipanda commented Apr 2, 2018

yuvipanda commented Apr 2, 2018

yuvipanda commented Apr 2, 2018

rgbkrk commented Apr 2, 2018

yuvipanda commented Apr 2, 2018 via email

takluyver Apr 3, 2018

minrk Apr 3, 2018

takluyver commented Apr 3, 2018

minrk Apr 3, 2018

willingc Apr 3, 2018

willingc Apr 3, 2018

takluyver Apr 3, 2018

takluyver Apr 30, 2018

yuvipanda Jun 11, 2018

takluyver commented Jun 10, 2018

yuvipanda commented Jun 11, 2018

willingc left a comment

Add a /metrics endpoint for Prometheus Metrics #3490

Add a /metrics endpoint for Prometheus Metrics #3490

Conversation

yuvipanda commented Apr 2, 2018 • edited Loading

yuvipanda commented Apr 2, 2018

yuvipanda commented Apr 2, 2018

yuvipanda commented Apr 2, 2018

rgbkrk commented Apr 2, 2018

yuvipanda commented Apr 2, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Apr 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Jun 10, 2018

yuvipanda commented Jun 11, 2018

willingc left a comment

Choose a reason for hiding this comment

yuvipanda commented Apr 2, 2018 •

edited

Loading