DB Convention does not cover batch/multi/envelope operations #712

ndimiduk · 2021-12-06T22:26:25Z

Our API has a small alphabet of relatively simple operations key-value operations (get, put, delete, &c.) For these, the operation names seem clear. We also have a set of operations that support bulk/batching of operations. these can be homogeneous or heterogeneous. For example, batch can accept a set of any combination of get, put, delete, &c. We also support a generic system for server-side compare-and-mutate, where some predicate based on a query over existing data is provided, and when the predicate returns true, some operation is applied — that operation can be a simple or a batch operation. for these collections of heterogenous operations, how should be annotate the span?

The text was updated successfully, but these errors were encountered:

arminru · 2021-12-13T13:22:08Z

Hey! Thanks for your question.
Would it be possible and make sense for you to track the individual operations within a batch with separate spans? Then you'd have one generic span for the batch (not following any OTel conventions) and the individual spans (following the DB conventions) would be children of that one.

ndimiduk · 2021-12-13T17:28:09Z

Would it be possible and make sense for you to track the individual operations within a batch with separate spans?

Anything is possible ;)

Do you mean that the server-side of the batch operation should make spans for each op nested within the batch? Or would you want to create this child spans in the client-side? They all funnel through a single RPC and associated span pair.

Then you'd have one generic span for the batch (not following any OTel conventions) ...

Why do you say that a span for the batch operation is not following any otel conventions? We have a data-action method in the api called batch. Why should it not be recorded as a db.operation ?

arminru · 2021-12-14T18:07:10Z

Do you mean that the server-side of the batch operation should make spans for each op nested within the batch? Or would you want to create this child spans in the client-side? They all funnel through a single RPC and associated span pair.

Why do you say that a span for the batch operation is not following any otel conventions? We have a data-action method in the api called batch. Why should it not be recorded as a db.operation ?

The database semantic conventions were only designed for client-side calls, not for the server end. If you could share some details on how your instrumentation looks like for the server and what you would expect from a semantic convention for calls from the server's perspective in a separate issue we can look into crafting such conventions based on the client ones.

Ah I did not consider batch an operation on its own but rather a more abstract set of operations. Let's try something else then.
I think you could have a parent span for batch and then multiple children with the respective operations as they are added to the batch, all following the DB semantic conventions each. However I assume the call for the batch will be executed at once so you wouldn't be able to track timing of each individual operation and thus have a set of zero duration spans just for the purpose of their attributes. This way you can inspect the content of your batch on your tracing backend and still have the same selectors in place to find the spans as if the operations were executed individually. An error would likely only be reflected collectively on the batch span, however,

    [=DB batch=========================]
    [] <- DB delete
    [] <- DB put
      [=RPC=========================]

Alternatively, if your database client library accepts the operations individually and only combines them into a batch later on, you could make each individual operation span be the child of it's "actual" causal parent and use span links to link the batch span to them instead.

    [====] <- SomeActionCausingADeleteOperationExecutedInDb 
     [] <- DB delete
--
        [===] <- SomeActionCausingAGetOperationExecutedInDb
         [] <- DB get
--
                         [=DB batch=========================] (linking to both DB delete and DB get from above)
                           [=RPC=========================]

The batch going over the wire in one RPC call would be modeled as a child of the batch span in either case.

Apache9 · 2021-12-16T09:38:41Z

In HBase, we will apply the batch at server side as a whole(almost), thw work flow is like this:

RPC server receives a batch -> grab all the row locks -> Build the WAL edit for all the operations -> Write out the WAL edit -> Apply all the operations to memstore -> advance the MVCC number -> return

So typical I do not think it is possible to use different spans to trace different operations in the batch. As you can see, although in every step we will likely process the operations one by one, but looking at a higher level, in each step will process all the operations and then go to the next step. It will be very strange to create a span for each operation and switch them all the time...

Thanks.

ndimiduk · 2022-01-06T23:25:21Z

@arminru @bogdandrutu I wonder if you have any thoughts about the PR linked here. The idea is to expose a summary of the content of a batch operation as an additional attribute that is implementation-specific. Specifically, I hope that a span storage/query system would be able to make use of that attribute to enable operators to find all spans that execute a given operation, whether that operation is executed at the top level or it is a part of a batch operation.

lmolkova · 2024-02-09T02:02:01Z

Assuming there is just one bulk operation that deals with batch as a whole, I can think of the following solutions:

E.g bulk operation consists of ["get foo", "delete bar"].

Option 1. Attributes with array values

db.operation = bulk
db.mydb.sub_operations = [get, delete], and db.mydb.some_other_attribute= [foo, bar] ...

Cons:

bulk operations are common and we should consider defining sub-operation attributes in top db namespace
the relationship between elements in attribute arrays is based on index (if we record more than one attribute per sub-operation) which is subtle and error prone
bulk operations with a lot of sub-operations might be too long for some backends that have low limits on the attribute length

Option 2. Events/logs

db.operation = bulk
Plus we emit an event for each sub-operation that contains grouped attributes describing that operation

We do something similar in messaging (with links though):

if a batch of messages is sent, send operation should have links to all messages being sent and their unique per-message properties should be on the link.
if there is just one message its properties should be recorded either on the link or on the span.

DB operations don't have an individual trace-context, so links are not suitable here, but events could work. Then it should also be easier to enable/disable sub-operation reporting depending on the needs. the drawback is that events/logs could go to a different backend

Cons:

if we don't have a case for multiple attributes describing one operation, this seems like an overkill and Option 1 would be a better choice.

Option 3. Creating artificial spans per sub-operation

Cons:

misleading
costly (both perf and volume)

Additional things to consider:

we should have an attribute that records the size of a batch (in case of messaging it's called messaging.batch.message_count)
metrics:
- we should consider having a metric that measures number of sub-operations (since db.operation.duration would not provide a count for bulk operations)
- bulk duration could be a different metric (e.g. with different histogram boundaries)

jcocchi · 2024-02-09T04:56:47Z

Cosmos DB is currently creating a string attribute db.cosmosdb.batch_operations with each operation type in the batch and the count for that type. Adding an attribute to the convention would be useful to standardize this.

We should be able to capture:

Overall count for batch
Operations in batch
Optionally: more information for each batch operation according to each db's requirements (count, status code etc.)

I prefer the simplicity of Option 1. Attributes with array values, but agree it creates a challenge for additional information about each batch operation. The most important piece of additional information for Cosmos DB is operation count, so maybe something like the following could work: db.batch.count = 6 db.mydb.sub_operations = [get:2, delete:4] . Capturing information beyond operation count may be too clunky in this format though

roji · 2024-02-09T09:57:48Z

@lmolkova isn't that conflating "batch" with "bulk", by proposing db.operation = bulk for something containing two things (get and delete)? The standard naming for this seems to be batching, where a bulk command usually corresponds to a command that changes multiple records (like a SQL UPDATE statement).

Regardless, the commands contained in a batch are typically the same in every aspect as a standalone command not executed in a batch; each command has a SQL (so db.statement), a db.operation (select, insert...), a set of parameters (something not currently represented in the semantic conventions, but which could/should), and any other attributes which a specific database may add. For me this strongly points towards represents the batched commands as spans, which would allow querying/interpreting them just like commands which aren't batched. Introducing a new way to represent batched commands may seem simpler on first look, but actually creates two ways to represent the same logical thing, and makes the data more difficult to interpret. In effect, a batch is conceptually is just a container for commands.

Note that it's true that certain attributes must be the same across all commands in the batch, e.g. the hostname, network info, etc. So these attributes could optionally be lifted up to the span representing the batch, leaving on the command only attributes which can vary (e.g. SQL, parameter info).

jcocchi · 2024-02-09T17:13:52Z

@roji one difference between batch operations and standalone operations is the duration. If you add each operation in a batch as its own span, is the duration of each sub operation the same as the parent? Is it 0? This could also create confusion because it may make those operations appear either abnormally quick or abnormally long

roji · 2024-02-09T17:32:12Z

@jcocchi that's true indeed... I don't know if there are other OTel cases where a larger "logical container" span wrap nested spans as in this case, and how that's best represented...

github-actions bot assigned bogdandrutu Dec 6, 2021

ndimiduk mentioned this issue Dec 15, 2021

HBASE-26473 Introduce db.hbase.container_operations span attribute apache/hbase#3951

Merged

jack-berg transferred this issue from open-telemetry/opentelemetry-specification Feb 7, 2024

github-actions bot assigned reyang Feb 7, 2024

jcocchi mentioned this issue Feb 9, 2024

Dealing with batching #710

Closed

lmolkova mentioned this issue Mar 10, 2024

How to record multi-operation/table/dbs operations on DB metrics #805

Closed

trask unassigned bogdandrutu Apr 25, 2024

trask mentioned this issue May 24, 2024

Add support for database batch operations #1072

Merged

lmolkova closed this as completed in #1072 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB Convention does not cover batch/multi/envelope operations #712

DB Convention does not cover batch/multi/envelope operations #712

ndimiduk commented Dec 6, 2021

arminru commented Dec 13, 2021

ndimiduk commented Dec 13, 2021

arminru commented Dec 14, 2021

Apache9 commented Dec 16, 2021

ndimiduk commented Jan 6, 2022

lmolkova commented Feb 9, 2024 •

edited

Loading

jcocchi commented Feb 9, 2024

roji commented Feb 9, 2024

jcocchi commented Feb 9, 2024

roji commented Feb 9, 2024

DB Convention does not cover batch/multi/envelope operations #712

DB Convention does not cover batch/multi/envelope operations #712

Comments

ndimiduk commented Dec 6, 2021

arminru commented Dec 13, 2021

ndimiduk commented Dec 13, 2021

arminru commented Dec 14, 2021

Apache9 commented Dec 16, 2021

ndimiduk commented Jan 6, 2022

lmolkova commented Feb 9, 2024 • edited Loading

jcocchi commented Feb 9, 2024

roji commented Feb 9, 2024

jcocchi commented Feb 9, 2024

roji commented Feb 9, 2024

lmolkova commented Feb 9, 2024 •

edited

Loading