Review naming conventions used for messaging metrics #937

pyohannes · 2024-04-18T14:09:22Z

Currently, messaging metric names adhere to the convention of messaging.<operation type>.[duration|messages].

It should be revisited whether these logic is extensible and expressive enough. Some things to consider:

messaging.operation was renamed to messaging.operation.type, messaging.operation.name was introduced to allow system-specific names for operations. It needs to be discussed whether this system-specific information in messaging.operation.name is also relevant for metrics.
For database, the metric name db.client.operation.duration is proposed. It needs to be discussed in how far DB and messaging should be consistent in that regard.
There is feat: add specification for messaging latencies #895 which proposed a metric for measuring message latency (and lag). It needs to be investigated how such metrics fit in with currently defined metrics.

The text was updated successfully, but these errors were encountered:

pyohannes · 2024-04-18T14:16:10Z

I think it makes sense to follow the current path insofar as we keep separate metrics for different operation types (publish, receive, and process) and not use a generic metric like it is done for DB metrics.

The main reason for this is the fact that I don't see a use case for aggregating metrics for both publish and process operations without regarding the operation type. This means, one with likely not ever use a metric messaging.operation.duration without filtering for a specific operation type.

pyohannes · 2024-04-24T08:36:27Z

Proposal 1

We stick with the current list of metrics and the pattern of having messaging.<operation name>.[duration|messages]:

messaging.publish.duration
messaging.publish.messages
messaging.receive.duration
messaging.receive.messages
messaging.process.duration
messaging.process.messages

We could extend this by adding different "synthetic" operation types for metrics that don't map to any defined operation (for example metrics for message latency or lag):

messaging.enqueued.duration (latency)
messaging.enqueued.messages (lag)

Proposal 2

We follow a pattern of messaging.[producer|consumer].operation.[duration|messages], which would be consistent with the pattern proposed for DB metrics. The operation name or type would be an attribute on the metric.

Here we would end up with the following metrics:

messaging.producer.operation.duration
messaging.producer.operation.messages
messaging.consumer.operation.duration
messaging.consumer.operation.messages

Metrics for latency and lag could be grouped under the consumer namespace, for example:

messaging.consumer.latency.duration
messaging.consumer.lag.messages

pyohannes · 2024-06-18T13:43:43Z

This is resolved with #1006, where the structure and naming of messaging metrics was discussed in great detail.

pyohannes added messaging-stability-blocker area:messaging labels Apr 18, 2024

github-actions bot assigned reyang Apr 18, 2024

lmolkova mentioned this issue Apr 22, 2024

Messaging.operation.type attribute should not be required #947

Closed

This was referenced May 2, 2024

Messaging metrics: clarify it's about client-side #795

Closed

Revamp messaging metrics to support generic operations #1006

Merged

pyohannes closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review naming conventions used for messaging metrics #937

Review naming conventions used for messaging metrics #937

pyohannes commented Apr 18, 2024

pyohannes commented Apr 18, 2024

pyohannes commented Apr 24, 2024

pyohannes commented Jun 18, 2024

Review naming conventions used for messaging metrics #937

Review naming conventions used for messaging metrics #937

Comments

pyohannes commented Apr 18, 2024

pyohannes commented Apr 18, 2024

pyohannes commented Apr 24, 2024

Proposal 1

Proposal 2

pyohannes commented Jun 18, 2024