Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SqlCommand.EnableOptimizedParameterBinding Feature #1041

Merged
merged 18 commits into from
Jul 7, 2021

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Apr 18, 2021

fixes #974

This PR adds a new property to SqlCommand called DisableOutputParameters. When set to true parameters names will not be sent to the sql server when the command is executed. This behaviour has been show to increase performance for commands with very large numbers of parameters which are used in bulk operations.

This is a draft because:

  1. This adds public surface area. The external name is open to debate.
  2. This needs documenting.
  3. This needs tests adding.

@KristianWedberg would you mind running your perf tests with the artifacts from this PR to ensure that this small change still has the performance changes observed in the investigation we did in your original issue?

@KristianWedberg
Copy link

Hiya @Wraith2, can you make a package available like previously?

@ErikEJ
Copy link
Contributor

ErikEJ commented Apr 18, 2021

@KristianWedberg You can download from the azure DevOps CI-BuildNPack pipeline here: https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_build/results?buildId=29463&view=artifacts&pathAsName=false&type=publishedArtifacts

@KristianWedberg
Copy link

KristianWedberg commented Apr 18, 2021

Yup, with UsePositionalParameters = true this new PR (Microsoft.Data.SqlClient.2.1.0-pull-049c895.21108.1.nupkg) also has great performance, scaling to 2048 parameters with about 10% better performance vs. ODBC:

UsePositionalParameters provider RPS CPR Parameters Throughput [params/s]
false Microsoft 4 8 32 132,464
false Microsoft 8 8 64 210,600
false Microsoft 16 8 128 251,774
false Microsoft 32 8 256 228,873
false Microsoft 32 16 512 159,879
false Microsoft 32 32 1024 92,758
false Microsoft 32 64 2048 48,713
true Microsoft 4 8 32 143,034
true Microsoft 8 8 64 257,235
true Microsoft 16 8 128 409,405
true Microsoft 32 8 256 630,407
true Microsoft 32 16 512 846,929
true Microsoft 32 32 1024 1,120,485
true Microsoft 32 64 2048 1,343,983
Odbc 4 8 32 133,329
Odbc 8 8 64 239,711
Odbc 16 8 128 378,298
Odbc 32 8 256 549,672
Odbc 32 16 512 755,040
Odbc 32 32 1024 984,802
Odbc 32 64 2048 1,120,003

So all good so far!

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 18, 2021

Excellent. So, fancy helping write some tests? 😁

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 18, 2021

I've added some simple tests which examine the behaviour of parameters with this setting enabled and they surprised me a bit.

  1. Output and InputOutput parameters don't work because the names can't be bound, seems sane enough.
  2. All parameters named in the query still have to be specified correctly so you can't just omit parameter names and you can't make them up, they've still got to match in the same way as using named parameters.
  3. Parameters can be re-used by name, so you can use an input parameter multiple times

So it seems that on input there is still a name-value map generated but it's populated by position not name match and then the engine proceeds normally requiring names to exist once the query is parsed etc. On output there is no name-value map so you can't use named outputs at all.

@roji
Copy link
Member

roji commented Apr 19, 2021

Out of curiosity... Is a new flag on SqlCommand really needed? Is there any reason you can't simply check if the command's parameters have their names set or not (i.e. ParameterName is null)? I'm raising this because I'm planning a similar change in Npgsql.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

Ideally I'd use a new flag in the CommandBehavior enum. That's a breaking change though and it's in /runtime and since I thought this was going to be SqlClient specific it didn't seem to be the right solution to the problem. If you could use the same value in npqsql then perhaps it's worth considering. I don't know if mysql or sqlite (or firebird?) would be able to or want to make use of it. It might be worth a wider discussion.

It does have to be at DbCommand level. Even though the parameter values aren't sent with names the RPC request still contains a list of parameter names and types and those get checked. It's peculiar but I'm finding from the tests that the only thing this changes is that the parameters are matched up to names in the order provided, you can still use a named parameter multiple times and you still have to name each parameter correctly and provide values for each named parameter.

@roji
Copy link
Member

roji commented Apr 19, 2021

Thanks @Wraith2, so you're saying users have to specify SqlParameter.ParameterName regardless of this option, right?

That's peculiar indeed... The concept of positional parameter usually means that placeholders themselves are positional: either via numeric placeholders like in PostgreSQL ($1, $2), or with implicit positioning (?). I doubt a CommandBehavior would make sense across multiple providers here.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

If there is a way to support true placeholder positional style I don't know what it is. ? isn't valid syntax and using @ fails with Must declare the scalar variable "@" from the server, so we sent it but it isn't acceptable. I'm working from the way RPC are executed in this driver. It may be that if I didn't send the parameter list string it would change how things work. I'm not aware of a T-SQL spec that tells me whether an unnamed parameter would be acceptable. I do know I've never seen anything other than stored procedure invocations use positional parameters with SQL Server.

If there is a use for the positional option in other drivers the enum flag is a better approach and the limitations of how Sql Server implements it would be something to cover in the documentation or provider.

edit:

Thanks @Wraith2, so you're saying users have to specify SqlParameter.ParameterName regardless of this option, right?

To answer this. No, users aren't required to enter a parameter name by the SqlClient driver but if you don't then the server is going to reject the parameter with Must declare the scalar variable "@" unless it's marked with ParameterDirection.ReturnValue and only one of those is permitted.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

We could do this without the formal extension to CommandBehavior by simply oring in a new flag value

const int PositionalParameters = 64;
CommandBehavior behavior = (CommandBehavior )((int)CommandBehavior.Default | PositionalParameters);
command.ExecuteNonQuery(behavior);

but that looks like a horrible hackjob and if I found that as an official answer I'd be suspicious about why the people who own the api felt the need to provide such a weird way to access it when they're capable of changing the api surface.

It would need some internal plumbing to make sure validation of command behavior didn't trip up on it but i can handle that.

@roji
Copy link
Member

roji commented Apr 19, 2021

... I agree this looks... not ideal :) There's also a risk of conflict in case CommandBehavior ever does get extended.

Is there any real downside to introducing a flag on SqlCommand as this PR currently does?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

The only danger I can think of it is that people will see and use it without understanding it.

@KristianWedberg
Copy link

If I understand the new functionality and limitations correctly, from the users' perspective this is not positional parameters. Rather it's an Input only (or InputAndReturn only?) optimization for >64 parameters.

I think a different property name would reduce any incorrect usage, e.g. UseInputOnlyParameters (or UseInputReturnOnlyParameters), and doc the performance advantage.

Thoughts?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

If I understand the new functionality and limitations correctly, from the users' perspective this is not positional parameters. Rather it's an Input only (or InputAndReturn only?) optimization for >64 parameters.

Not quite, consider this (passing) test:

[Fact]
private static void PositionalParameters_ParametersAreUsedByPosition()
{
    using (var connection = new SqlConnection(DataTestUtility.TCPConnectionString))
    {
        connection.Open();

        using (var command = new SqlCommand("SELECT @Second, @First", connection))
        {
            command.UsePositionalParameters = true;
            command.Parameters.AddWithValue("@First", 1);
            command.Parameters.AddWithValue("@Second", 2);

            using (SqlDataReader reader = command.ExecuteReader())
            {
                reader.Read();

                int firstOutput = reader.GetInt32(0);
                int secondOutput = reader.GetInt32(1);

                Assert.Equal(1, secondOutput);
                Assert.Equal(2, firstOutput);
            }
        }
    }
}

On input: Names must all be declared. Names can be reused without taking an additional value. Names and values are associated at first encounter of the name by taking the next value from the list of values.
On output: Names don't exist.

@KristianWedberg
Copy link

OK. How about UsePositionalOutputParameters then?

And are Return parameters affected or not (i.e. should they be referenced in the name or not)?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

OK. How about UsePositionalOutputParameters then?

Capturing all the complexity in the name is virtually impossible unless we use an absurd name. We're going to have to rely on the docs to inform people what it changes and what it doesn't.

And are Return parameters affected or not (i.e. should they be referenced in the name or not)?

Return parameter names aren't really important because nothing in the driver cares about the name. They work as the did before because their names aren't relevant. There is only one return value it has no name and it has type int32, in the absence of explicitly returned value it has default value 0.

@KristianWedberg
Copy link

Given what you said about Return values, I'd certainly vote for UsePositionalOutputParameters over UsePositionalParameters to reduce incorrect usage...

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

It also affects input parameters as I showed in that test so using the word Output with the attendant implication that it only affects output parameters seems misleading. Perhaps UseAnonymousParameters or SendAnonymousParameters would be closer but as I said you're still going to need people to read the docs.

@roji
Copy link
Member

roji commented Apr 19, 2021

@Wraith2 I'm probably missing something, but in this example, would the code work any different if UsePositionalParameters isn't set? It just looks like regular named parameter code to me...

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

@Wraith2 I'm probably missing something, but in this example, would the code work any different if UsePositionalParameters isn't set? It just looks like regular named parameter code to me...

[edit]
Actually no, i'm talking nonsense that test is doing the same both ways. let me investigate.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2021

Ok. It's fun. The server side seems to be using the rpc parameter order to match values to names so as long as the build of the parameter list parameter isn't screwed up in some way then the order will be preserved.
What happens when you do a query is that it executes an rcp request which can have a variable number of system parameters (depends on batch, sproc, text) and then those parameters are followed by the user parameters. So in the case of that test what you get is a param array like this:

rpc.params[0] = ("", "SELECT @Second, @First, @Second", nvarchar(255)) // system param query
rpc.params[1] = ("", "@First int, @Second int", nvarchar(255)) // system param params list
rpc.params[2] = ("", 1, int) // user param
rpc.params[3] = ("", 2, int) // user param

So the server is matching up the index of the parameter name in the parameterList ( index1 in the rpc) text with the following user params and because they're in order it works. If you found a way to make the order difference then it'd fail but since we build and send in a single operation it'll always work if rpc are built this way.

In which case calling it DisableOutputParameters is probably accurate since that's the only effect you'll see. I think more tests might be a good idea to cover the edges of this to make sure we understand it completely.

@Wraith2 Wraith2 changed the title Add SqlCommand.UsePositionalParameters Feature Add SqlCommand.DisableOutputParameters Feature Apr 19, 2021
@cheenamalhotra
Copy link
Member

cheenamalhotra commented Apr 20, 2021

DisableOutputParameters sounds like you're simply disabling output parameters from the command, whereas that's not the case. Since it's strictly related to "named parameters", I have a few suggestions shared below. I understand the complex behavior this property introduces, we can continue the discussion to pick the best option:

  • "SendOutputParameterNames" (default true)
  • "DisableOutputParameterNames" (default false)
  • "SendRPCParameterNames" (default true)
  • "DisableRPCParameterNames" (default false)

Tagging @saurabh500 @David-Engel for suggestions too!

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 20, 2021

We don't really expose users to the word RPC so I'm not a fan of using that in the name.
I'd prefer a negative default value because that way any code like designer code will be unchanged by the appearance of a new member. Which leaves DisableOutputParameterNames, seems reasonable.

What would you expect the behaviour to be if the name were DisableOutputParameters without the Name suffix? Perhaps it's worth adding in client side code to make it behave that way.

@DavoudEshtehari DavoudEshtehari added 📈 Performance Use this label for performance improvement activities 🆕 Public API Use this label for new API additions to the driver. labels Apr 23, 2021
@mburbea
Copy link

mburbea commented Apr 24, 2021

From my understanding this flag is essentially changing what the RPC is sending to the server from:

exec sp_executeSql N'select @','@ int',@=2

to the following:

exec sp_executeSql N'select @','@ int',2

So perhaps DisableParameterNameBinding? It's a bit of a mouthful but it's clear as to what it's doing.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 24, 2021

I'll vote for clarity over terseness, seems like a reasonable name to me.

@roji
Copy link
Member

roji commented Apr 24, 2021

Don't want to bikeshed too much, but an API name should generally express why a user would use it. If I understand correctly, the point of this flag would be to get better performance, whereas the current naming makes it sound like users would use it in order to DisableOutputParameters, which doesn't really make sense; disabling output parameters is a side-effect of the more efficient transfer technique, no?

So my proposal would be to call this something more like UseOptimizedParameterTransfer or similar. If this is incompatible with output parameters, I'd detect that and throw NotSupportedException with a clear message to the user.

@KristianWedberg
Copy link

Even documented I think UseOptimizedParameterTransfer sounds so good to the user it will get significantly more erroneous uses compared to DisableParameterNameBinding.

@cheenamalhotra
Copy link
Member

cc @David-Engel
Docs review

Copy link
Contributor

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. 👍

@cheenamalhotra cheenamalhotra merged commit 145b5f5 into dotnet:main Jul 7, 2021
@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 8, 2021

That's my first surface area addition to the library. Everything I've done before has been invisible to users. Woo!

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 8, 2021

@Wraith2 Congrats! And my first time naming one!

Do you have a benchmarking suite, would be interesting to test 1.0 against 3.0?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 8, 2021

I've been relying on the benchmarks that @KristianWedberg posted in the original issue and has kindly been running on the interim builds to verify the changes https://github.com/KristianWedberg/sql-batch-insert-performance

@ErikEJ
Copy link
Contributor

ErikEJ commented Jul 8, 2021

@Wraith2 I meant general benchmarks that could highlight the general performance improvements made over time, not related to this PR.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 8, 2021

Ah, no. The closest i've got to that is the https://github.com/aspnet/DataAccessPerformance benchmark which recreates the TE Fortunes benchmark. Everything else has been highly situational and micro benched. In general my focus has been memory usage on common paths like opening connections, executing a reader, reading valuetypes etc.

@Wraith2 Wraith2 deleted the feature-anonymousparams branch July 18, 2021 21:57
@JRahnama JRahnama changed the title Add SqlCommand.DisableOutputParameters Feature Add SqlCommand.EnableOptimizedParameterBinding Feature Aug 23, 2021
DavoudEshtehari pushed a commit to Wraith2/SqlClient that referenced this pull request Sep 1, 2021
@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 19, 2021

@ErikEJ looks like 4.0.0 has been released so your new api is public, let loose the blog post! ;)

@ErikEJ
Copy link
Contributor

ErikEJ commented Nov 19, 2021

Yay!

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 19, 2021

@KristianWedberg i also had some followup thoughts on this. Do you re-use SqlParameter classes a lot when inserting data? And if so does the object boxing overhead of valuetypes show up in profiles?

@mburbea
Copy link

mburbea commented Nov 19, 2021 via email

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 19, 2021

I think there is a way that the boxing could be avoided but it's only profitable with with repeated uses of the same parameter. If you're seeing WriteToServerAsync you're probably running a cpu profile in which case you'd see boxing as GC time.

@KristianWedberg
Copy link

@Wraith2 Yes, the SqlParameters get heavily reused. I pre-create a few batch insert statements (e.g. 1, 32, and 1024 rows) with associated SqlParameters, and generate a function to quickly and repeatedly set the parameters with new values for each new batch.

As you noted during the investigation, value types will be boxed and generates GC pressure. This didn't affect the performance in any of my benchmarks (I also tested with pre-boxed value types), but if bottlenecked on client CPU cycles this could of course matter to some degree. The amount of boxing is simply one per value type inserted per inserted row, and I'm guessing all generation 0, so the impact should be easy to estimate.

I could implement this pre-boxing also in my production code, but that's currently on a fairly low priority. How were you thinking of addressing this from your end?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 20, 2021

I've got an experimental branch which allows SqlParameter to detect and use StrongBox<T> : where T is [a supported value type] in the Value field. As long as you keep hold of the StrongBox you can read and write from it's Value field without boxing. Obviously that's only beneficial if you re-use the parameter value multiple times since the StrongBox is a heap allocation.

@KristianWedberg
Copy link

Gotcha. In my scenario I'm guessing I would store the StrongBox<T> instances in one array per column in the insert batch (since they would be of the same type), and then generate the code to assign these instances instead of the parameters.

Bit of work, but certainly doable. There is however nothing SqlClient specific with this improvement - it would be really sweet if this was an optional API on DbParameter so that any provider could implement it - any chance of that?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 21, 2021

Other providers wouldn't need it, most of them have a DbParameter<T> construct of their own or no need for it. In SqlClient we can't replace SqlParameter easily or make it generic without large changes. Since the DbParameter is largely abstract every provider has it's own logic to deal with the values so there's no common mechanism we could usefully expose.

Being able to quietly understand a generic argument container like would be backwards compatible and perform better when opted into. There's a small amount of complexity around nulls but as long as it's properly documented it wouldn't be difficult for anyone who has used direct ADO in the past to deal with.

@KristianWedberg
Copy link

Even so, why not design it so that it (optionally) can be reused across providers, even if it means they might have both their own API and the portable API?

I might not be able to justify implementing this for a non-portable API, since I'm generating the parameter setting code with expression trees, and doing that differently for different providers could be more complexity than it's worth...

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 21, 2021

That would need runtime implementation planning and work and then pushing out to all providers. So far that level of co-operation hasn't been possible. We can see what @roji thinks but I expect it would be a no-go for a general provider feature because as i said other providers can handle it better already.

@roji
Copy link
Member

roji commented Nov 21, 2021

Other providers wouldn't need it, most of them have a DbParameter construct of their own or no need for it. In SqlClient we can't replace SqlParameter easily or make it generic without large changes.

FWIW I'm not aware of another provider apart from Npgsql which has DbParameter<T>. dotnet/runtime#17446 is about adding a generic parameter API to ADO.NET; I'm planning to maybe take a stab at this for 7.0, ideally there would be a default implementation which delegates to non-generic DbParameter and which therefore boxes (but could providers could implement it without boxing).

SqlClient we can't replace SqlParameter easily or make it generic without large changes.

Note that at least in the current Npgsql implementation, NpgsqlParameter<T> doesn't replace the non-generic NpgsqlParameter - it's just there alongside it. So there wouldn't be any breaking changes or anything to do the same in SqlClient. You can look at what I did in Npgsql for inspiration, though it would be good to lock down a good design at the ADO.NET level first before implementing.

Even so, why not design it so that it (optionally) can be reused across providers, even if it means they might have both their own API and the portable API?

Well, first of all, what would there actually be here to design across providers, in System.Data? SqlClient can simply support writing when SqlParameter.Value is set to StrongBox<T> - this wouldn't require any additional API surface here, right? Or am I missing something?

Furthermore, if SqlClient does implement this, then it seems like a generic SqlParameter<T> could easily be implemented on top of it... In other words, when somebody sets SqlParameter<T>.Value (which would be of type T), that could simply set the value on an internally-managed StrongBox<T>, and delegate that to non-generic SqlParameter, no?

I guess what I'm saying, is that if SqlClient can support writing StrongBox<T>, then it's a bit hard to see why it couldn't also just add a SqlParameter<T> implementation directly, which is a better API to expose to users... There may be some details here that I'm missing though.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Nov 21, 2021

I guess what I'm saying, is that if SqlClient can support writing StrongBox, then it's a bit hard to see why it couldn't also just add a SqlParameter implementation directly, which is a better API to expose to users... There may be some details here that I'm missing though.

In technical terms there is some difference but it could be worked out. The SqlParameter implementation leans heavily on reflection to identify the type so a lot of internal logic would need to change if we no longer have the field. The hack i've done sort of sidesteps it but still needs the object value internal field it just contains the strongbox so setting it has no downside.

In practical terms getting the interface prototyped, approved, implemented in the BCL, implemented in SqlClient and then available to consumers is a multi-year effort and so far we've been unable to get The batching Api more than halfway through that process in SqlClient despite the other major providers having completed it.

Hacks in this one library that don't affect the external interface are much "easier" to get implemented than trying to change api surface. Here it might only take a year.

@roji
Copy link
Member

roji commented Nov 22, 2021

In practical terms getting the interface prototyped, approved, implemented in the BCL, implemented in SqlClient and then available to consumers is a multi-year effort and so far we've been unable to get The batching Api more than halfway through that process in SqlClient despite the other major providers having completed it.

The batching API was delivered in .NET 6.0. I wouldn't estimate that a generic DbParameter<T> in System.Data is a multi-year effort (though it may slip due to prioritization considerations).

If you feel that eliminating parameter boxing is urgent/high-value enough, then you can indeed go ahead and add support for StrongBox. Given that there's an intention to solve this at the System.Data level, I personally don't see a reason to rush this - it's been this way since forever, it's an optimization that probably isn't terribly important in the SqlClient context (because of the other perf issues), and importantly, when the System.Data API is done you'd be stuck maintaining both the standard DbParameter<T> and the SqlClient-specific StrongBox support (which at that point would be obsolete). So I personally wouldn't do this - there's enough other important work - but this would of course be a SqlClient decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Use this label for performance improvement activities 🆕 Public API Use this label for new API additions to the driver.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poor INSERT performance with ExecuteNonQuery() and many parameters
10 participants