Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review naming conventions used for messaging metrics #937

Closed
pyohannes opened this issue Apr 18, 2024 · 3 comments
Closed

Review naming conventions used for messaging metrics #937

pyohannes opened this issue Apr 18, 2024 · 3 comments

Comments

@pyohannes
Copy link
Contributor

Currently, messaging metric names adhere to the convention of messaging.<operation type>.[duration|messages].

It should be revisited whether these logic is extensible and expressive enough. Some things to consider:

  • messaging.operation was renamed to messaging.operation.type, messaging.operation.name was introduced to allow system-specific names for operations. It needs to be discussed whether this system-specific information in messaging.operation.name is also relevant for metrics.
  • For database, the metric name db.client.operation.duration is proposed. It needs to be discussed in how far DB and messaging should be consistent in that regard.
  • There is feat: add specification for messaging latencies #895 which proposed a metric for measuring message latency (and lag). It needs to be investigated how such metrics fit in with currently defined metrics.
@pyohannes
Copy link
Contributor Author

I think it makes sense to follow the current path insofar as we keep separate metrics for different operation types (publish, receive, and process) and not use a generic metric like it is done for DB metrics.

The main reason for this is the fact that I don't see a use case for aggregating metrics for both publish and process operations without regarding the operation type. This means, one with likely not ever use a metric messaging.operation.duration without filtering for a specific operation type.

@pyohannes
Copy link
Contributor Author

Proposal 1

We stick with the current list of metrics and the pattern of having messaging.<operation name>.[duration|messages]:

  • messaging.publish.duration
  • messaging.publish.messages
  • messaging.receive.duration
  • messaging.receive.messages
  • messaging.process.duration
  • messaging.process.messages

We could extend this by adding different "synthetic" operation types for metrics that don't map to any defined operation (for example metrics for message latency or lag):

  • messaging.enqueued.duration (latency)
  • messaging.enqueued.messages (lag)

Proposal 2

We follow a pattern of messaging.[producer|consumer].operation.[duration|messages], which would be consistent with the pattern proposed for DB metrics. The operation name or type would be an attribute on the metric.

Here we would end up with the following metrics:

  • messaging.producer.operation.duration
  • messaging.producer.operation.messages
  • messaging.consumer.operation.duration
  • messaging.consumer.operation.messages

Metrics for latency and lag could be grouped under the consumer namespace, for example:

  • messaging.consumer.latency.duration
  • messaging.consumer.lag.messages

@pyohannes
Copy link
Contributor Author

This is resolved with #1006, where the structure and naming of messaging metrics was discussed in great detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: V1 - Stable Semantics
Development

No branches or pull requests

2 participants