Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ValueTask<T> returning async APIs to ADO.NET #25297

Open
roji opened this issue Mar 3, 2018 · 46 comments
Open

Add ValueTask<T> returning async APIs to ADO.NET #25297

roji opened this issue Mar 3, 2018 · 46 comments

Comments

@roji
Copy link
Member

roji commented Mar 3, 2018

With https://github.com/dotnet/corefx/issues/27445 allowing elimination of allocations in asynchronously-executing async methods, we should look into adding ValueTask-returning counterparts to ADO.NET. This would potentially allow zero-allocation database access.

Here are the current async methods:

Task DbConnection.OpenAsync(...);

Task<DbDataReader> DbCommand.ExecuteReaderAsync(...);
Task<DbDataReader> DbCommand.ExecuteDbDataReaderAsync(...);
Task<object> DbCommand.ExecuteScalarAsync(...);
Task DbCommand.ExecuteNonQueryAsync(...);

Task<bool> DbDataReader.ReadAsync(...);
Task<T> DbDataReader.GetFieldValueAsync<T>(...);
Task<bool> DbDataReader.IsDBNullAsync(...);

Notes:

  1. Naming the new methods is going to be complicated. Some parts of corefx are lucky in that they're introducing ValueTask<T> overloads along with Span<T> (e.g. Stream), but ADO.NET has no such upcoming parameter changes/additions. Since there can't be yet another overload that only differs on the return type, and the "standard" name with just the Async suffix is taken, we need to come up with a new naming pattern.

  2. There are some missing async methods in ADO.NET in general (e.g. DbCommand.Prepare(), DbConnection.BeginTransactionAsync(...), DbTransaction.CommitAsync(...)...). These can be added directly with return value ValueTask<T> (covered by issue https://github.com/dotnet/corefx/issues/35012). Naming of these new methods should probably be decided in the context of whatever new naming pattern we use for the ValueTask<T> returning variants of the existing async methods.

  3. Also note that async APIs that return Task or Task<bool> (such as ReadAsync() and IsDBNullAsync()) can be implemented to return cached Task<T> instances when they complete synchronously to gain efficiency instead of adopting ValueTask<TResult>. This mitigates the need for ValueTask<T> returning variants for most APIs. Also, in terms of allocations, the advantage of ValueTask<T> is much greater the more fine grained the async methods are.

  4. Compatibility for providers: It seems desirable to offer a default implementation of the Task-based method that calls AsTask() on the ValueTask<TResult> instance returned by the new API, so that new providers don't need to implement the old Task<TResult> version of the API but instead only implement the more efficient ValueTask<TResult> version. It also seems good not to force existing providers to switch.


**Updated by @divega on 1/13/2019 to consolidate with https://github.com/dotnet/corefx/issues/15011.

@divega
Copy link
Contributor

divega commented Mar 3, 2018

System.Data Triage: We should consider this, but since it isn’t for 2.1, moving to future.

@davidfowl
Copy link
Member

Are any of these methods streaming? Would IAsyncEnumerable be interesting here?

@divega
Copy link
Contributor

divega commented Mar 3, 2018

@davidfowl DbDataReader.Read is streaming. The current API does not resemble IEnumerable (possibly because it predates it 😄), but the new API could use IAsyncEnumerable if the timing is good.

See this was mentioned on aspnet/DataAccessPerformance#32.

@roji
Copy link
Member Author

roji commented Mar 3, 2018

DbDataReader implement IEnumerable, exposing rows as DbDataRecord. It's an old API (non-generic, sync-only), and the default implementation, which uses DbEnumerator, is also pretty inefficient: it calls DbDataReader.GetValues(), which copies the row's values into an object[] (which gets allocated on each MoveNext()...).

But I actually don't really see a reason not to also have DbDataReader implement IAsyncEnumerable... It could provide a modern and efficient way to stream over the the rows of a resultset. Opened dotnet/corefx#27684 to track.

@GSPP
Copy link

GSPP commented Apr 2, 2018

Is the allocation overhead really significant when opening a connection or executing a reader? Executing select null against SQL Server on the same machine over shared memory and using synchronous IO takes about 200us. This already is so much more than allocating a few objects and collecting them.

Adding ValueTask versions for reading rows and fields might be valuable.

@roji
Copy link
Member Author

roji commented Apr 2, 2018

You're right that when OpenAsync() returns asynchronously (i.e. a physical open or wait), the memory overhead is probably negligible. For synchronous invocations (no I/O, pooled idle connection is returned) there is no memory overhead, since OpenAsync() returns a non-generic task which can be cached. So I agree this has more value for DbDataReadet.Read().

However, don't forget that ADO.NET is sometimes used to access databases like Sqlite, where the physical open can be much much faster (no networking), so it could be significant there.

@Grauenwolf
Copy link

Grauenwolf commented Nov 7, 2018

ValueTask DblDataReader.GetFieldValueAsync(...);
ValueTask DbDataReader.IsDBNullAsync(...);

If we're going to touch this, we might as well fix the IsDBNull problem at the same time. Currently we need to make two calls, once to check for nulls and again to actually read the value. When I profile my ORM, IsDBNull is often flagged as most expensive operation in aggregate. (EDIT: this is no longer true.)

We can probably make it a lot more convenient efficient with methods such as

ValueTask<T?> DblDataReader.GetFieldValueOrNullAsync(...);

@roji
Copy link
Member Author

roji commented Nov 7, 2018

Depending on the ADO.NET provider implementation, calling IsDBNull() separately from GetFieldValue<T>() isn't necessary a big perf problem (I'm not sure it's the case with Npgsql). But introducing something like GetFieldValueOrNull() (and its async counterpart) would be a good idea regardless, if only for improving the API/usability.

Note that using the async versions of these methods is probably a bad idea unless CommandBehavior.SequentialAccess is specified, since the entire row is pretty much guaranteed to be buffered in memory anyway...

@Grauenwolf
Copy link

You're right. When I last benchmarked it on .NET Framework a year or two ago it was a major issue. Now it's barely visible in the performance profiler.

@Grauenwolf
Copy link

Still, it would be nice for those who are using ADO.NET directly to not have to write those checks.

@roji
Copy link
Member Author

roji commented Nov 8, 2018

Absolutely, it would be great if ADO.NET got a bit of a face-lift in terms of API usability, regardless of any perf aspects...

Good to know that perf-wise things are better, there have been a lot of changes in recent two years.

@GSPP
Copy link

GSPP commented Nov 8, 2018

If this proposed deserialization API is implemented then the ADO.NET APIs will not be very chatty. There will be less of a need for ValueTask. ValueTask can be used internally.

@divega
Copy link
Contributor

divega commented Nov 9, 2018

@roji @Grauenwolf I would like us to track the idea of GetFieldValueOrNull[Async] as a separate issue. Would you like to create it?

@Grauenwolf
Copy link

I haven't read the underlying code yet, so I think someone else would be better suited to proposing an API.

@MgSam
Copy link

MgSam commented Nov 9, 2018

I vote against mixing ValueTask and Task return types arbitrarily based solely on the age of the method. ValueTask return types should only be highly targeted to methods likely to be called over and over.

I do agree BeginTransactionAsync and CommitTransactionAsync are needed, though they should not be ValueTask.

@divega
Copy link
Contributor

divega commented Nov 9, 2018

@Grauenwolf stating the problem with the current API would be enough. But no worries.

@Grauenwolf
Copy link

I'm ok with BeginTransactionAsync and CommitTransactionAsync not being ValueTask.

Really it is only DBDataReader that I care regarding ValueTask because it is in a tight loop.

@divega divega changed the title Add ValueTask overloads to ADO.NET async APIs Add ValueTask<T> returning async APIs to ADO.NET Feb 1, 2019
@divega
Copy link
Contributor

divega commented Apr 3, 2019

@roji I wonder how this would look like now that we have a better understanding of the guidance re ValueTask<T> vs. Task<T> (from dotnet/efcore#15184).

It seems to me that only the API on DbDataReader is worth considering.

@roji
Copy link
Member Author

roji commented Apr 3, 2019

@divega I agree - for the execution APIs which never return synchronously, it doesn't make much sense to think about ValueTask. I will update the description soon to reflect this.

There's also dotnet/corefx#35012 (adding missing async methods, as opposed to ValueTask counterparts to existing async methods). It seems that proposal already follows the guidance - the only ValueTask-returning methods are DisposeAsync() (so coming from IAsyncDisposable) and CloseAsync(), which is almost identical to DisposeAsync().

@divega
Copy link
Contributor

divega commented Apr 3, 2019

@roji perhaps we should get everything we want from this issue copied into dotnet/corefx#35012 and bring that one to API review. I am saying this because the scope of this issue (#27682) is probably now small enough that it wouldn't affect the chances of dotnet/corefx#35012. I also think it helps have a more holistic view during the API review.

@divega
Copy link
Contributor

divega commented Apr 3, 2019

Though there is still the naming problem 😄

@roji
Copy link
Member Author

roji commented Apr 4, 2019

Yeah, and for GetFieldValue there's also (in theory) still the question of better nullability handling (even though right now we don't seem to have anything better). That's why it may be good to handle them differently - I think dotnet/corefx#35012 can be brought to design review right away.

@divega
Copy link
Contributor

divega commented Apr 4, 2019

I added api-ready-for-review on dotnet/corefx#35012. Let's at least discuss the ValueTask<bool> version of ReadAsync there.

@roji
Copy link
Member Author

roji commented Apr 4, 2019

@divega a small note after the recent discussions: ValueTask<bool> seems to be the same case as ValueTask, in that it provides no value for synchronous completion: Task is already cached (only two possible values). It may still make sense to add ValueTask<bool> overloads in case a provider ever decides to optimize the asynchronous completion case (via IValueTaskSource) - although at this point that seems pretty unlikely, given the effort required (see @stephentoub's comment quoted in dotnet/efcore#15184 (comment)).

We could defer adding those APIs until an easier alternative is introduced (as is planned for some point) - since until then the ValueTask<bool> version will actually be a bit slower than the Task one (due to the overhead of the added layer etc.). What do you think?

However, in any case we still have GetFieldValueAsync() which would benefit from an overload returning a ValueTask<T>.

@divega
Copy link
Contributor

divega commented Apr 4, 2019

Yes @roji, however DbDataReader.ReadAsync is exactly like IAsyncEnumerable<T>.MoveNextAsync, which is already optimized for the async completion case. I suspect someone willing to optimize it could reuse ManualResetValueTaskSourceCore<TResult> and duplicate the logic used to build async iterators.

I am hoping that in the API review meeting we can get the right people to ask about this.

@roji
Copy link
Member Author

roji commented Apr 5, 2019

@divega I'll try to investigate a bit how that works. But you're that in any case, even if today it's difficult to benefit from the async completion optimization, it may become easier in the future, and this specific API is definitely a place where that may make sense. As you say, raising this in the API review meeting would probably get us the answers we need.

Regardless, I'll soon update the description to reflect this discussion.

@GSPP
Copy link

GSPP commented Apr 9, 2019

An argument can be made for providing the highest possible performance alternative (likely ValueTask): Who still uses raw ADO.NET these days? That is rare. Normally, people use an ORM or a thin wrapper such as Dapper. Inconvenient APIs are hidden from most callers.

@MgSam
Copy link

MgSam commented Apr 9, 2019

@GSPP Tons and tons of people still use ADO.NET. There are huge classes of developers that refuse to use anything that is not shipped with .NET. Not to mention all the legacy code out there.

@roji
Copy link
Member Author

roji commented Apr 9, 2019

@GSPP can you explain how your argument that users don't usually use ADO.NET directly relates to this issue? At least at this point it isn't really recommended to add APIs everywhere that return non-generic ValueTask, as it is very unlikely that this would help perf in any way (it would actually degrade perf in some cases compared to returning non-generic Task).

There's also a direction of thought about making ADO.NET more usable directly (as opposed to usable only via other layers).

@Grauenwolf
Copy link

Even those who use ORMs still need to drop down into ADO.NET for legacy databases, especially if they have stored procedures.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2019

I was thinking about this recently and had an idea. There are three main scenarios.

  1. The caller want to know if something is null but then acts directly on that knowledge. I's say this is rare and is covered by IsDBNull[Async]
  2. The caller knows by value of the query definition that the field will not be null and null appearing is an exceptional condition, this is covered by Get* and GetFieldValue[Async]
  3. The caller wants to know if something is null and if it isn't then get that value. At the moment you do IsDBNull followed immediately by GetFieldValue. This is the interesting case.

Even when what you want is buffered locally you're calling two Task<T> returning methods and unless T is bool you're really going to have to allocate most of the time. Even if we could change this to ValueTask<T> it would still be two calls to ask about the same field. The usual sync pattern for this sort of thing would be bool TryGetNotNullFieldValue<T>((string fieldname, out T value) which if ok but that out parameter prevents it being converted to async. But! we have ValueTuples now so we don't have to use the awkward pattern of having additional return parameters as out/ref. We can write ValueTask(bool isNotNull, T value)> GetNotNullFieldValue(Async<T>(string fieldname) which is neatly async valuetasked and asks both relevant questions in a single call.

You'd end up with callsites like this:

private static async Task Performance(string connectionStirng)
{
    using (var connection = new SqlConnection(connectionStirng))
    using (var command = new SqlCommand("select id from table",connection))
    using (var reader = await command.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            var valueTask = reader.GetNotNullFieldValueAsync<int>("id");
            var result = valueTask.IsCompletedSuccessfully ? valueTask.Result : await valueTask.ConfigureAwait(false);
            if (result.isNotNull)
            {
                // use the value
            }
            else
            {
                // er...
            }
        }
    }
}

And if you didn't care about the valuetask pattern you can just directly await the method and skip the task variable entirely. This way T should be annotatable with the nullability attributes so you don't have to escape when accessing result.Value. If you really want to go all-out it could return a struct type with bool operator overload so you can just do if (result) { // use result.Value Of course the function name sucks but that's because all the really good names are already taken, it'd have to be something silly and overlong but we can pass that one up to the design review meeting to handle, they love spending hours trying to decide the best names for things 😁

@GSPP
Copy link

GSPP commented Sep 2, 2019

The ValueTask(bool isNotNull, T value)> idea is promising. It's also a functional API now. out is side-effecting and it does not compose as well as an expression. Making two calls (IsNull...) is much less nice than one call that returns exactly what you want.

Essentially, ADO.NET wants an option type. This is what a SQL NULL is. It's almost exactly what Nullable provides except that Nullable is restricted to value types.

Probably, this should be a custom struct with deconstruction support. ValueTask is not supposed to be in public APIs at this point (I agree with that. A custom struct is cleaner given the very high cleanliness requirements of the framework.).

As a side note, I see this (IMHO) design mistake a lot in the framework where people really wanted an option type but instead chose bool TryGetX(out value) or some other rather awkward pattern (such as two methods calls).

Maybe .NET would benefit from an option type that is not restricted to value types. But since we don't have that we can give ADO.NET it's own fundamental option type.

@roji
Copy link
Member Author

roji commented Sep 2, 2019

ValueTask is not supposed to be in public APIs at this point

How do you mean? ValueTask is exposed in many public APIs (e.g. IAsyncDisposable.DisposeAsync, Stream.ReadAsync...).

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 2, 2019

There was resistance to using value tuples in public apis due to not having agreed naming guidance for them. I believe this was resolved by the design meetings a few months ago so it might be possible. I'm fine with using a struct instead of a value tuple.

@roji
Copy link
Member Author

roji commented Sep 2, 2019

Ah I see - I think there was some confusion between ValueTuple and ValueTask above.

@Grauenwolf
Copy link

ValueTask is not supported by Visual Basic. Until that is addressed we should be hesitant about using it in public APIs.

@stephentoub
Copy link
Member

ValueTask is not supported by Visual Basic.

How is it not supported?

@Grauenwolf
Copy link

@GSPP
Copy link

GSPP commented Sep 2, 2019

ValueTask is not supposed to be in public APIs at this point

=> I meant value tuple.

Regardless of whether it is allowed or not, I think it's wise most of the time to use a custom type. (This stance is for library code only, like .NET).

@stephentoub
Copy link
Member

ValueTask is not supported by Visual Basic.

How is it not supported?

The docs just say no without going into details. https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.valuetask-1?view=netstandard-2.1

@KathleenDollard, @jaredpar, what about ValueTask/ValueTask<T> isn't supported in Visual Basic?

@jaredpar
Copy link
Member

jaredpar commented Sep 4, 2019

Visual Basic cannot define an async method which produces a ValueTask / ValueTask<T> but it can consume a ValueTask / ValueTask<T> using the Await operator.

@davidfowl
Copy link
Member

VB didn’t get Task likes?

@jaredpar
Copy link
Member

jaredpar commented Sep 4, 2019

VB didn’t get Task likes?

Correct. It is still limited to defining async methods that return Task / Task<T>. It can consume ValueTask / ValueTask<T>, or any other type that implements the awaiter pattern.

https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQBsQDdhJiAligHwEkwAHAewRigAIBlAT1jjAFgAoEiq2x5sAHQAVABYI4EACa4AdgHNh0ANZQOABQCuwFLgDGNAMIpotAxxoWam7XpoBBKAxn6AYhucxc5GTQCyAChh7WiFlAEpgmlCoJXNLeLsAdwhcIJg4iwBRGUkaNw8vGQyaYusdfQcnV3ddT28/ACZAyIA1CBQNOGilCIcaNo6u5WKE5NSadPZ47Nz82sKOGcNjKCggA==

@Grauenwolf
Copy link

Grauenwolf commented Sep 4, 2019 via email

@jaredpar
Copy link
Member

jaredpar commented Sep 4, 2019

There is nothing preventing VB from implementing a member that has a ValueTask return type. VB just can't do it using the Async function style. If an Async style method was needed then a simple thunking approach can be used to put the logic in an Async style method and call that in a ValueTask returning method directly.

@Grauenwolf
Copy link

Grauenwolf commented Sep 4, 2019 via email

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@ajcvickers ajcvickers removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0, Future Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests