Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenMetrics / Prometheus / metrics endpoint (monitoring) #773

Open
lefuturiste opened this issue Jul 18, 2023 · 6 comments
Open

feat: OpenMetrics / Prometheus / metrics endpoint (monitoring) #773

lefuturiste opened this issue Jul 18, 2023 · 6 comments

Comments

@lefuturiste
Copy link

lefuturiste commented Jul 18, 2023

Hello, I will be interested to use a monitoring endpoint on martin. Of course I'm ready to work on it.

The need is to have some sort of monitoring endpoint to retrieve some metrics.
I would be interested for:

  • metrics linked to the web server
  • metrics linked to the connexion martin has to postgres and some performance insight
  • metrics linked to the duration of the tile generating process
  • any others that should be discussed

I'm thinking this can be behind a feature flag in order to give the user a choice.

What are your take on this?

Implementation details:

we have the choice between two prometheus instrumentation libraries

the https://github.com/tikv/rust-prometheus library is compatible with the actix web instrumentation library https://github.com/nlopes/actix-web-prom

@nyurik
Copy link
Member

nyurik commented Jul 18, 2023

Thanks, I love the idea! Moreover, these metrics should also be shown in the root web ui, and possibly even allow Ratatui CLI UI (via a cli flag). Obviously all these should not be in a single PR. I think it should be enabled by default (this way only those who want to use Martin as a library would disable it), and there could be a CLI flag like --no-metrics to disable it at runtime?

@lefuturiste
Copy link
Author

Cool, I'm starting a POC with the nlopes/actix-web-prom lib.
And probably we will add metrics to monitor performance of the PostgreSQL requests.

@nyurik
Copy link
Member

nyurik commented Jul 19, 2023

@lefuturiste you may also be interested in charming crate to produce some nice graphs. This way we can have an endpoint like /_/graph/pie_by_ret_code.svg or /_/graph/pie_by_source.svg to produce a pie chart SVG image of which HTTP code was returned to the user or how many requests were made to each source. Eventually we could have some admin interface (something that can be easily disabled by nginx proxy or a CLI flag) that shows some stats.

@lefuturiste
Copy link
Author

@lefuturiste you may also be interested in charming crate to produce some nice graphs. This way we can have an endpoint like /_/graph/pie_by_ret_code.svg or /_/graph/pie_by_source.svg to produce a pie chart SVG image of which HTTP code was returned to the user or how many requests were made to each source. Eventually we could have some admin interface (something that can be easily disabled by nginx proxy or a CLI flag) that shows some stats.

I will prioritize the instrumentation of the software.

Your ideas sound cool, but I personally don't think that including software to visualize the data is a good idea. I think that we should separate this kind of features and let the user have the choice on how their want to view the data.

At least for my use case, I don't need it since I will be using grafana to visualize the data and analyze how martin behave in production.

@nyurik
Copy link
Member

nyurik commented Jul 19, 2023

Fair point. Technically the charming crate uses Apache javascript library too, so as long as Martin can provide some statistics via an api of sorts, javascript could do all of that. Martin can bake-in various javascript libs during the build step if needed.

@lefuturiste
Copy link
Author

lefuturiste commented Jul 19, 2023

Okay, now for the main issue, which is getting the metrics out. I have a problem.
By default the nlopes/actix-web-prom crate will get use the path pattern in order to reduce cardinality of labels
so it gives something like this (and after analyzing the code for actix-web-prom it has no way to configure that).

martin_http_requests_total{endpoint="/{source_ids}/{z}/{x}/{y}",method="GET",status="200"} 10

but for my need I still want to have different metric families for each {source_ids} variants. For that, I will need to change the implementation of the actix-web-prom crate.

I imagine some sort of attribute macro on the route, I'm currently trying out to find out what's the best method for a dev to tell to actix-web-prom: "Hey for this route I will need to keep the cardinality for the {source_ids} param".

I will open an issue upstream and may be open a MR. But depending on the time it will take to merge it, we may have to use a workaround or not have this feature at all, or use our own version of the crate for more flexibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants