Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose all prometheus metrics under /metrics #1070

Open
labkode opened this issue Aug 10, 2020 · 20 comments
Open

Expose all prometheus metrics under /metrics #1070

labkode opened this issue Aug 10, 2020 · 20 comments

Comments

@labkode
Copy link
Member

labkode commented Aug 10, 2020

@Daniel-WWU-IT @ishank011 I agree that we should expose all the necessary info under one endpoint only.

@Daniel-WWU-IT
Copy link
Contributor

Daniel-WWU-IT commented Aug 10, 2020

@labkode Makes sense, and also ensures that every site exposes the necessary system info. I will merge the sysinfo service into the Prometheus service right away.

@Daniel-WWU-IT
Copy link
Contributor

@labkode @ishank011 See PR #1071 - the system information is now included in the main metrics.

I tried to use the OpenCensus Prometheus API, though I cannot add custom labels to a single metric; I could only add them to all exported metrics, which would result in a mess. There are other ways to export metrics which include labels, though these can't be combined with what is already there. So I chose to stick with using the Prometheus API directly (this is what OpenCensus does internally just as well); I only had to create a Prometheus registry shared by OpenCensus and my SysInfo system, which seems acceptable to me.

@ishank011
Copy link
Contributor

@Daniel-WWU-IT after discussions, we decided that we do not want to have info about or carry out operations for any stats that some service or package is exposing in the prometheus HTTP service. The current approach forces us to pass the prometheus client registry to the sysinfo package, and as we continue adding stats from different services, we'd have to do this for all of those and this wouldn't scale well.

So please take a look at this PR #1105 which uses the opencensus stats package to add labels. We can modify the sysinfo package to work this way and this will take away all dependencies from the HTTP service.

@redblom on similar lines, this is the approach we want to follow for the metrics package as well. The prometheus client would not impose any restrictions (regarding the driver) on any services. Each of those can just register themselves. Let's follow up on your PR regarding this.

@Daniel-WWU-IT
Copy link
Contributor

@ishank011 I took a look at your PR and this seems like a good solution. I will incorporate the necessary changes and create a new PR.

@redblom
Copy link
Contributor

redblom commented Aug 20, 2020

@Daniel-WWU-IT @ishank011 : I was having a discussion on Gitter with @ishank011 regarding the accounting metrics part and I would like to proceed with that here (@Daniel-WWU-IT I'm aware that you may already started with additional coding based on PR #1105 on your part).
I noticed that code is being moved around a bit due to changing insight. Anyways both accounting and sysinfo metrics should follow the same approach and implementation so I thought it would be wise to have a common agreement on that.

@redblom
Copy link
Contributor

redblom commented Aug 20, 2020

@Daniel-WWU-IT @ishank011 so initially I was under the impression that any metric (from any! module) would be registered via the metrics pkg and the prometheus service could then simply initialize that pkg and metrics data would be published. So that's how I came to PR #973. Also in this model metrics data are 'simply' put into a repository (file, db, ...) by the metrics producing module which is then read by the metrics pkg to be registered. But then SysInfo came about and that did not follow that route. So from @ishank011 I now understand that each module may register their metrics themselves.

This however leaves the issue of where/how to initialize these metrics modules. It can be done in the prometheus service, the same way dummy metrics is initialized. But that would mean that for each new metrics producing pkg the prometheus service would need to be expanded with a new import. So, can we come up with a solution where a package can be initialized via some configuration instead of hard coding into prometheus service?

Another option would be that we make the requirement for new metrics producing modules to put their metrics data in the repository through a very simple api (how I originally envisioned this). From thereon the metrics pkg picks up the data from the repository and does all the registering.

@Daniel-WWU-IT
Copy link
Contributor

Daniel-WWU-IT commented Aug 20, 2020

@redblom @ishank011 I don't know if writing metrics to a temporary file which is then read and used to, well, create metrics again, is such a good solution. You'd have to support quite a lot of features -- basically everything the OpenMetrics protocol offers -- to make everyone happy. In the SysInfo case, for example, the one metric it exposes has no meaningful value, it's all about accompanying labels.

My initial suggestion would be to first settle on a "metrics value" API everyone should use to create metrics... the "bare" Prometheus Client API, the various ones OpenCensus has, something else. Then Reva could offer some global registration functions (in a dedicated package I'd say) people can use to register their own metrics. The rest is background-magic. In your dummy driver, you're currently using OpenCensus views for this, so we could just specify that people need to create such a view and register that globally via a function Reva offers.

But the most important thing might be: This MUST be documented clearly -- everyone needs to know what internal features (or guidelines/idioms) Reva sports in order to use them.

@redblom
Copy link
Contributor

redblom commented Aug 20, 2020

@Daniel-WWU-IT @ishank011

@redblom @ishank011 I don't know if writing metrics to a temporary file which is then read and used to, well, create metrics again, is such a good solution. You'd have to support quite a lot of features -- basically everything the OpenMetrics protocol offers -- to make everyone happy. In the SysInfo case, for example, the one metric it exposes has no meaningful value, it's all about accompanying labels.

Yeah that has crossed my mind as well though accounting metrics are quite straightforward in that regard. But I understand that's not necessarily always the case. For accounting metrics it was this idea that gives the partners an opportunity right now to dump actual data instead of dummy data. But I also think this model is still under investigation.

My initial suggestion would be to first settle on a "metrics value" API everyone should use to create metrics... the "bare" Prometheus Client API, the various ones OpenCensus has, something else. Then Reva could offer some global registration functions (in a dedicated package I'd say) people can use to register their own metrics. The rest is background-magic. In your dummy driver, you're currently using OpenCensus views for this, so we could just specify that people need to create such a view and register that globally via a function Reva offers.

I agree, for me OpenCensus views work fine so I am happy to keep using these. If that works for you let's settle on that then.

But the most important thing might be: This MUST be documented clearly -- everyone needs to know what internal features (or guidelines/idioms) Reva sports in order to use them.

Absolutely! A recurring theme :)

@Daniel-WWU-IT
Copy link
Contributor

Daniel-WWU-IT commented Aug 20, 2020

@redblom @ishank011

From Ishank's PR, views should work fine for me, yeah. Haven't worked on this yet (luckily!), but I think we can agree on using them for all metrics. I just wasn't aware of how to add custom labels (aka tags in OpenCensus), that's why I didn't use them in the first place.

So that being settled, we should then come up with a central Reva-Metrics-Registry where anyone can just add a new view on the fly. From an API point of view, it should just be no more than something like RegisterMetricsView(myView)... something like that. Not sure if you can easily add new views on the fly, but at least I'd suppose so.

@redblom
Copy link
Contributor

redblom commented Aug 20, 2020

@Daniel-WWU-IT @ishank011 Hmm, I like the idea to make it as easy as possible for others to register metrics, but just this single method would not be worth the effort I think. So, there are 3 steps involved here:

  1. Create the view (eg. for number of site users):
    the measure/metric:
    m := &Metrics{ NumUsersMeasure: stats.Int64("cs3_org_sciencemesh_site_total_num_users", "The total number of users within this site", stats.UnitDimensionless) }
    and the view:
    v := &view.View{ Name: m.NumUsersMeasure.Name(), Description: m.NumUsersMeasure.Description(), Measure: m.NumUsersMeasure, Aggregation: view.LastValue(), }
  2. Register the view (as simple as):
    view.Register(v)
  3. Record metric data periodically:
    stats.Record(context.Background(), m.NumUsersMeasure.M(...10397...))

So if we facilitate central registering that would be step 2 which does not take away a lot of work.
Now recording is initiated the 'user'. But maybe we could facilitate something there as well to make this central registering more valuable, but I'm not sure if there's really something to gain ... what do you think?

@Daniel-WWU-IT
Copy link
Contributor

@redblom @ishank011

Of course there is a bit more involved than a single function call :) That was just about how to register a new view. Since OpenCensus already exposes such global variables for registering etc., it might at first glance not be worth the effort to put yet another layer on top of this. Then again, hiding some details and possibly putting the second part of step 1 and step 2 into a small extra module might make things easier API wise. And so people would only need to take a look into our "metrics" package and quickly understand what to do.
So a user only needs to create such a Metrics objects and pass that along with a few extra params to our internal metrics registration function and later only needs to record new stats (and that could also be made a bit simpler).
I'd have to look more closely at how things work in OpenCensus in order to be able to really judge how feasible and helpful this could and would be, though...

@labkode
Copy link
Member Author

labkode commented Aug 20, 2020

OpenCensus is already an abstraction for metrics, having our own abstraction is an overkill, at least for now, where the focus is on MVP. Besides that, OpenCensus is well-document and used in another projects, so people jumping into it should be more comfortable that our custom plugin.

@Daniel-WWU-IT
Copy link
Contributor

Daniel-WWU-IT commented Aug 21, 2020

@redblom @ishank011 @labkode
I have played around with OpenCensus and rewrote the Prometheus exporter to use that library. Works quite well, so I think we can stick to this solution for now.

The question about how to register the various views still remains, though. A simple loader like for the various services at least doesn't work in my case at it is now, since the system information hasn't been initialized at loading time yet.

@Daniel-WWU-IT
Copy link
Contributor

Daniel-WWU-IT commented Aug 21, 2020

@redblom @ishank011
I rewrote the system info metrics to use OpenCensus: #1114. For now, I just register the exporter in the Prometheus service, but as discussed, we should find a better solution for this.

@ishank011
Copy link
Contributor

  • I don't think there's a need to write a wrapper over this. Like @labkode said, this is already an abstraction, so that won't be of much use.
  • The metrics package is a special package which won't care how we get the metrics (different providers can have ad-hoc approaches to write to the metrics file). It just reads the metrics from a file and registers those to prometheus.
  • There's no need to import services registering metrics in the prometheus file. I did that with the metrics package because it wasn't initialized at any point. I just wanted to call its init method so that we could start the goroutine. For services which are running continuously, that is not needed since they'll handle it themselves. For standalone packages, we can just define a loader file, include those packages there and add that loader file here.
  • The opencensus stats module fulfils all our criteria so let's finalize it to be used for all the metrics.

@ishank011
Copy link
Contributor

An important thing is to not pollute the prometheus HTTP service. Each service registering the metrics would need to have the prometheus service running on its host. Adding details to every instance of the service would add a lot of duplication and redundancy.

@Daniel-WWU-IT
Copy link
Contributor

I made yet another change to #1114 and moved the metrics registration out of the Prometheus package. There is no need for a loader module either, so this solution should be the cleanest way.

@redblom
Copy link
Contributor

redblom commented Sep 1, 2020

@Daniel-WWU-IT : didn't you had an issue somewhere with shared configs not yet being available upon pkg initialization. The metrics pkg is supposed to use sharedconfig only but that is only initialized when other configs are ('explicitly') initialized. Did you find a solution for this?

@Daniel-WWU-IT
Copy link
Contributor

@redblom Not exactly. I somehow need to get the version number etc. into the sysinfo package, and I do this with a simple initialization call in the main.go file of revad. And this call has of course not been done yet during package initialization. My workaround was to register the sysinfo metrics in the very same call I use to initialize sysinfo.

@redblom
Copy link
Contributor

redblom commented Sep 1, 2020

@Daniel-WWU-IT : thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants