Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize on DiagnosticSource/OpenTelemetry for database tracing #22336

Closed
Tracked by #48253 ...
divega opened this issue Jun 17, 2017 · 16 comments
Closed
Tracked by #48253 ...

Standardize on DiagnosticSource/OpenTelemetry for database tracing #22336

divega opened this issue Jun 17, 2017 · 16 comments
Assignees
Labels
area-System.Data enhancement Product code improvement that does NOT require public API changes/additions User Story A single user-facing feature. Can be grouped under an epic.

Comments

@divega
Copy link
Contributor

divega commented Jun 17, 2017

Currently we have around 27 EventSource trace points in System.Data.Common.

On the other hand, SqlClient has been instrumented with DiagnosticSource.

I talked to @vancem today and we came to the conclusion that there may be enough value in standardizing on DiagnosticSource before we release 2.0. This would make the story more consistent for consumers of ADO.NET events (we are planning to add more DiagnosticSource instrumentation and make it easier for ADO.NET providers to add it) and once we ship the EventSource instrumentation it is unlikely that we will be able to remove it.

cc @saurabh500

@vancem
Copy link
Contributor

vancem commented Jun 19, 2017

To document a bit more about our discussion on Friday, This rationale includes

  1. It is good to be uniform at least for components that are likely to be used together in logging scenarios. Having SQL Client and System.Data.Common having different logging mechanisms is not ideal.
  2. If we want to change, we should do it now. My understanding is that there is no dependency on the existing logging (yet), but we will not be able to remove the EventSource logging after 2.0 ships.
  3. The change should be modest in terms of lines of code. (since the logging sites are already wrapped in helpers, it should be a 1 day kind of a change (which is also very safe).
  4. IF there are no issues with dependency (that is you live in the new Nuget package world), we generally recommend people instrument with DiagnosticSource because we have a bridge from DiagnosticSource to EventSource. (thus you get both), There are some potential downsides (Going through the bridge is extra overhead, and getting at the logging from EventSource is not as natural to EventListeners), so it is not a 'slam dunk' if high efficiency, and if the EventSource scenario (out of process), is the most important, but we do generally advice people to only instrument with DiagnosticSource. Thus moving System.Data.Common to DiagnosticSource is reasonable (assuming no dependency problems).

@stephentoub

@stephentoub
Copy link
Member

stephentoub commented Jun 19, 2017

we generally recommend people instrument with DiagnosticSource because we have a bridge from DiagnosticSource to EventSource.

This seems counter to what I thought we'd previously discussed. In particular, DiagnosticSource only supports logging objects, and when we discussed the downsides of that from a performance perspective (e.g. allocations on each call when you have value types or multiple objects to log in a call), it was argued that these extra allocations were ok because DiagnosticSource usage would be very chunky, wouldn't be hot path, etc. But now it sounds like you're saying the default should be to use DiagnosticSource, and so rather than having strongly-typed logging methods that are allocation-free, we're pushing people by default to use these allocating, not-strongly-typed methods? I don't get it.

Now, for this particular case, that probably doesn't matter, as IIRC, the tracing that's done just uses strings, which isn't particularly efficient to begin with. But in general this seems like the wrong direction to push people. And I thought one of the advantages to the DiagnosticSource design was that it did dump everything to an EventListener, so it wasn't supposed to matter whether the source was an EventSource or a DiagnosticSource, and you could pick the one best suited to the component, i.e. EventSource for most high-frequency and general logging, and DiagnosticSource for chunkier objects that you wanted to share out less frequently.

@vancem
Copy link
Contributor

vancem commented Jun 19, 2017

The advice between whether to use DiangosticSource or EventSource is indeed not very clear cut some cases (when the messages are high volume and/or EventListeners/ETW are your main scenario. Indeed if you goal is to get messages from your instrumented code to ETW, using DiagnosticSource just ads overhead and complexity. I tried to indicate this in my previous comment:

There are some potential downsides (Going through the bridge is extra overhead, and getting at the logging from EventSource is not as natural to EventListeners), so it is not a 'slam dunk' if high efficiency, and if the EventSource scenario (out of process), is the most important,

So I think most of the guidance is consistent: we both agree EventSource should be used in the high volume case. We also agree that DiagnosticSource is needed if there is a true in-proc case where 'chunky' things need to be passed. The only issue is what is the 'default' (when the volume is not high and you may or may not need the in-proc capabilities).

I must admit I am conflicted about this, and at this second-order details matter, but recommending DiagnosticSource is not unreasonable. Uniformity is valuable, especially of messages likely to be used together (e.g. from strongly related components). The using DiagnosticSource also ALLOWs you to be chunky (pass unserializable things) even if you don't need it initially.

Reasonable people can disagree here, and I am frankly not happy with the fact that the guidance is not as clear cut. Still, for me, at least, was enough to tip the balance in this particular case.

If you disagree, by all means do so (that is why I added you to this issue).

@karelz
Copy link
Member

karelz commented Jun 20, 2017

You might want to do bar check for 2.0. We are way deep in Escrow.

@divega
Copy link
Contributor Author

divega commented Jun 20, 2017

@vancem @stephentoub @saurabh500 I looked at the code in more detail and apparently I was completely wrong with my estimates. It seems that the number of calls to members of DataCommonEventSource in System.Data.Common is actually closer to 300 . Sorry for the misleading information I provided previously.

I think it might be too late to change this, unless we find a way to just change the implementation of DataCommonEventSource.

@karelz thanks. I will follow up.

@stephentoub
Copy link
Member

stephentoub commented Jun 20, 2017

I looked at the code in more detail and apparently I was completely wrong with my estimates. It seems that the number of calls to members of DataCommonEventSource in System.Data.Common is actually closer to 300

For the “simple fix,” the number of call sites doesn’t really matter: the whole change could be done in the DataCommonEventSource type to keep the change localized and minimal, e.g.
stephentoub/corefx@22bdd08
This isn't ideal, but it would at least move from using EventSource to DiagnosticSource. Of course, if the goal is to trace out more of the actual objects being used, then yeah, the change would be much more invasive across the hundreds of sites currently tracing, and maybe more that aren't yet.

I'm still not convinced that's something we should do; I'm still not clear on what value such a switch would actually bring. And it adds cost, e.g. the allocations here stephentoub/corefx@22bdd08#diff-d2371fa205738bbb9e02dbcd7f75aac5R80, here stephentoub/corefx@22bdd08#diff-d2371fa205738bbb9e02dbcd7f75aac5R66, and here stephentoub/corefx@22bdd08#diff-d2371fa205738bbb9e02dbcd7f75aac5R20.

That said, the current EventSource usage is a port from the previous tracing that existed in the desktop version of the component, and doing it well would involve further changes even for EventSource. In particular, right now it's using the same basic approach that the previous tracing was: all Trace calls result in a string being generated and that string being traced. Doing this more efficiently would instead involve a multitude of different WriteEvent overloads being used to avoid creating strings from the data being traced. I at least invested enough here when I ported this so that allocations and the associated string creation work only happens if logging is enabled, but we could take it further.

@vancem
Copy link
Contributor

vancem commented Jun 21, 2017

I think we should do nothing. The current logging is string based, and frankly need to be reworked. The only question is whether this logging is useful enough as it is to leave in (and thus take a compatibility burden on).

@divega
Copy link
Contributor Author

divega commented Jun 21, 2017

Thanks @vancem for looking at this. I am leaning towards disabling this code to avoid the compatibility burden and get a chance to do a design with more time. I assume we can just remove (or #ifdef) code in DataCommonEventSource.

@saurabh500 @stephentoub thoughts?

@saurabh500
Copy link
Contributor

Based on my experience dealing with ADO.Net I know that this tracing is used when Microsoft engineers get involved with the CSS to collect traces to diagnose issues with Applications running in the field. The traces have been used by the engineers to diagnose issues in ADO.Net.
I haven't heard of any customers which consume these traces directly to bother figuring out what is wrong with the library.

All the tracing from SqlClient was stripped out and never re-instated and we have tough time figuring out what issues the customers are running into.
I am in favor of going with the current state of tracing and not changing anything at all so that we have a way to diagnose any issues with the DataSet and DataTable which is where this tracing would really be useful. It's better than having nothing.

@vancem
Copy link
Contributor

vancem commented Jun 21, 2017

IF the goal of this logging is to simply allow ad-hoc diagnosis, then just leaving it in IN THEORY is OK, since that use does not really have a compatibility problem. The problem is that simply because it is there, people COULD take a dependency on it and thus cause issue.

Options include

  1. Simply breaking it when the time comes (assume no one has taken a dependency on it).
  2. Log a disclaimer that this logging may be changed/deleted (helps ease our minds)
  3. Put the logging behind a Environment variable like ADO_NET_LOGGING_EXPERIMENTAL that makes it unlikely that a programmatic dependency would be taken.

Obviously (1) is the simplest solution, and the time window may not be large (months), and we are talking about .NET Core, which is easier to control the breakage (it does not get updated 'under' you by windows update). So just leaving it is reasonable

@divega
Copy link
Contributor Author

divega commented Jun 22, 2017

Thanks @saurabh500 and @vancem Based on this argument, I am ok with (1), e.g. saying that we are leaving what we have in place and we can revisit soon after 2.0 when we build an all up plan for ADO.NET, SqlClient and other providers. When the time comes, we can then decide if it is necessary and ok to break. So, I am punting this.

@divega
Copy link
Contributor Author

divega commented Jan 30, 2018

As per the last comment, this is something we would like to revisit when/if we build common diagnostics infrastructure for ADO.NET providers. We are not planning to do this for 2.1, so moving to backlog.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@roji
Copy link
Member

roji commented Feb 5, 2020

Small note after looking at this again: a quick grep in dotnet/runtime's System.Data.* reveals that all current EventSources seem to be emitted from DataSet/DataTable/DataRow and related types; in other words, they are not emitted from any actual provider-related code.

@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@ajcvickers ajcvickers removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2020
@ajcvickers ajcvickers modified the milestones: Future, 6.0.0 Dec 7, 2020
@roji roji added the User Story A single user-facing feature. Can be grouped under an epic. label Feb 10, 2021
@roji roji changed the title Standarize on DiagnosticSource for tracing in System.Data.* components Standarize on DiagnosticSource/OpenTelemetry for tracing in System.Data.* components Feb 10, 2021
@roji roji changed the title Standarize on DiagnosticSource/OpenTelemetry for tracing in System.Data.* components Standarize on DiagnosticSource/OpenTelemetry for tracing in System.Data components Feb 10, 2021
@roji roji changed the title Standarize on DiagnosticSource/OpenTelemetry for tracing in System.Data components Standardize on DiagnosticSource/OpenTelemetry for tracing in System.Data components Feb 10, 2021
@roji roji changed the title Standardize on DiagnosticSource/OpenTelemetry for tracing in System.Data components Standardize on DiagnosticSource/OpenTelemetry for database tracing Feb 10, 2021
@roji
Copy link
Member

roji commented Feb 17, 2021

Take the Activity API into account as well when looking into this (e.g. open-telemetry/opentelemetry-dotnet#675).

@mairaw
Copy link
Contributor

mairaw commented May 26, 2023

@roji should we mark this as Cut or remove this from the .NET 6 project? Trying to clean things up a bit.

@roji
Copy link
Member

roji commented May 30, 2023

@mairaw thanks, yeah. I'm both removing this from the project and closing the issue as I no longer think it's very relevant. Since this issue was opened, OpenTelemetry has emerged as the tracing standard, and has an experimental spec for database client semantic conventions. ADO.NET providers should simply implement that.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2023
@roji roji removed this from the Future milestone May 30, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Data enhancement Product code improvement that does NOT require public API changes/additions User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

No branches or pull requests

10 participants