perf: Move datareader caches down to internal connection #499

Wraith2 · 2020-03-30T23:06:49Z

A standard pattern of using a data reader is to use using blocks like this:

using (var connection = new SqlConnection(connectionString))
using (var command = new SqlCommand(commantText, connection))
using (var reader = await command.ExecuteReaderAsync())
{
    // use reader
}

Which means that the reader object is used only once. We added cached instances of backing objects for commonly used functions like IsDbNull and Read but these are only useful if you re-use the reader or call the same function more than once.

This PR takes the cache location for the common async state operations and moves them from the DataReader down to the internal connection. By doing this if users frequently do async operations eventually the majority of pooled internal connections will have cached async state objects which will further reduce memory churn.

Wraith2 · 2020-04-01T09:01:35Z

To port this to netfx you'd need to port #328 first and that's a difficult job because of the complexity in the async paths.

David-Engel · 2020-04-06T21:52:25Z

@Wraith2 How does this behave when MARS is enabled and there are interleaved data readers? Offhand, it looks like they would pick up each others' cached snapshot and context objects. Maybe I'm missing something, though.

Wraith2 · 2020-04-06T22:09:16Z

They'd work as you'd expect, one will pick up the cached instance and if the other will find there isn't one and create it's own. The first to finish will return the cleared instance and then the second will let it's instance go to gc. The only way around that would be to put them on the cache on the SNI handle but that isn't exposed as far up as the reader so it'd need some virtual plumbing. Whatever instance ends up in the cache is not linked to a particular reader, it's a blank object with no remaining state it's just existing that saves the perf.

If anyone is worried about performance they won't be using mars, mars sucks for perf. This will be no worse than current in all cases and better in non-mars heavy re-use scenarios.

Wraith2 · 2020-06-02T21:23:16Z

@cheenamalhotra this one is pretty simple and should be able to get into 2.0 GA.

cheenamalhotra · 2020-06-24T15:51:29Z

Whatever instance ends up in the cache is not linked to a particular reader, it's a blank object with no remaining state it's just existing that saves the perf.

I don't think snapshot is non-linkable to a particular reader, it contains all this info:

SqlClient/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlDataReader.cs

Lines 5391 to 5415 in fc6aece

    
           if (_snapshot == null) 
        
           { 
        
               _snapshot = Interlocked.Exchange(ref _cachedSnapshot, null) ?? new Snapshot(); 
        
               _snapshot._dataReady = _sharedState._dataReady; 
        
               _snapshot._haltRead = _haltRead; 
        
               _snapshot._metaDataConsumed = _metaDataConsumed; 
        
               _snapshot._browseModeInfoConsumed = _browseModeInfoConsumed; 
        
               _snapshot._hasRows = _hasRows; 
        
               _snapshot._altRowStatus = _altRowStatus; 
        
               _snapshot._nextColumnDataToRead = _sharedState._nextColumnDataToRead; 
        
               _snapshot._nextColumnHeaderToRead = _sharedState._nextColumnHeaderToRead; 
        
               _snapshot._columnDataBytesRead = _columnDataBytesRead; 
        
               _snapshot._columnDataBytesRemaining = _sharedState._columnDataBytesRemaining; 
        
               // _metadata and _altaMetaDataSetCollection must be Cloned 
        
               // before they are updated 
        
               _snapshot._metadata = _metaData; 
        
               _snapshot._altMetaDataSetCollection = _altMetaDataSetCollection; 
        
               _snapshot._tableNames = _tableNames; 
        
               _snapshot._currentStream = _currentStream; 
        
               _snapshot._currentTextReader = _currentTextReader; 
        
               _stateObj.SetSnapshot();

Metadata is mainly going to be shared. What if 1 reader creates a snapshot, while another reader accesses it (with different table being read - whose metadata don't match/conflict, wouldn't it lead to wrong data conversions? Can you write tests to test it specifically?

Wraith2 · 2020-06-24T16:26:28Z

I'm not sure I follow. The interlocks are there to ensure that no two readers can take the cached instance at one time. If two readers need it (mars presumably) then one will have to create it's own snapshot and use it. Those haven't changed. All that's different is the location of the cache so that it'll stay alive with the internal connection rather than being discarded with the reader or connection.

cheenamalhotra · 2020-06-25T21:29:48Z

I've been debugging this and use of cachedSnapshot seems to depend on local instance of _snapshot which is null everytime SqlDataReader is initialized. I don't see any risk in this PR, but I also don't think this PR gives any considerable performance improvement but makes it a bit confusing as to why would SqlInternalConnection need to keep this cache when it's not even going to be referenced again/re-used outside the scope of SqlDataReader.

By doing this if users frequently do async operations eventually the majority of pooled internal connections will have cached async state objects which will further reduce memory churn.

Could you elaborate a bit with a use-case where you think a cached snapshot will be re-used outside the scope of SqlDataReader instance?

Unless you're proposing to save object creation time taken in SqlDataReader and instead reset cache all the time and not really re-use the snapshot information..?

Wraith2 · 2020-06-25T22:03:05Z

Each time the user starts an async operation on the data reader (or on a command which internally uses the a reader like execute scalar) a snapshot is used to keep track of the state from the "safe" start point until another safe point is reached and reader is updated. The primary example of this is reading binary data.

You start in a consistent state and the user requests an async read of a column that hasn't been seen yet. The reader snaps and then starts requesting data and keeps on requesting data until it reaches the beginning of the column that if wants and by processing the incoming data it changes the state of the reader into an inconsistent state. During the duration of the async read the user can observe that the internal state of the reader is unexpected by issuing a sync operation and it's possible that the sync operation can interfere with the async one. When the async operation has moved to the column it wants to read it can set the reader state to the current async state resyncing the state if it wants. Then it proceeds to start to start reading the requested data.. The snapshot in the reader and the snapshot in the parser implement this isolation of the async operation from the sync state.

Each time you do anything async you need async state to be kept somewhere. The pattern I showed in the original code is a common one where you create a connection command and reader and then discard them all. If we keep the cached reader snapshot on the reader it's only available for that instance of the reader and is discarded with the reader. If we move it down as I suggest then each time an internal connection is linked to a connection and a reader is created the state for the async snapshot may already be present on the internal connection so it won't be created and can be recycled.

Why is it useful? because the common pattern is wasteful if you are doing many small async reads because you'll never use the cached instance. The worst case scenario for this is quickly trying to read a single row creating connection command and reader each time. Sadly this sort of thing is very common because TVP are a pain to use correctly require database type support and aren't universal in other providers so aren't used in wrapping libraries like Dapper and EF, Developers also think in loops not sets far too often.

This change will reduce memory usage for cases where users are doing small frequent async operations using readers. It does this by putting the state on something that is reused frequently and thus will build up the cached objects in the internal connection pool lifetime.

cheenamalhotra · 2020-06-25T22:13:39Z

Appreciate the explanation @Wraith2 :)

cheenamalhotra · 2020-06-25T22:18:09Z

I also noticed this is not even implemented in NetFx, so porting this PR isn't possible, maybe we can consider adding support for this cache in NetFx sometime in future?

cheenamalhotra · 2020-06-25T22:19:08Z

To port this to netfx you'd need to port #328 first and that's a difficult job because of the complexity in the async paths.

But i see this now :) Nvm, we'll take care of both.

Wraith2 · 2020-06-25T23:05:47Z

Ultimately combining the two codebases should result in all improvements in netcore being brought into netfx. I think TdsParser, state object and the reader will probably be at the end of that effort but I hope we do get there.

roji · 2020-06-26T11:19:39Z

During the duration of the async read the user can observe that the internal state of the reader is unexpected by issuing a sync operation and it's possible that the sync operation can interfere with the async one.

@Wraith2 am I reading this correctly, that SqlClient supports a sync operation on a given reader instance while an async read is still in progress? If so, that's quite surprising...

Each time you do anything async you need async state to be kept somewhere

Again I probably am lacking lots of context, and I'm also assuming this isn't just about supporting concurrent reader operations (as above). If so, then FWIW this is exactly what async/await takes care of for you - all local function state (the state machine) is automatically lifted to the heap when the method yields asynchronously. In modern asynchronous programming you shouldn't need to manually manage async state in most typical cases.

Wraith2 · 2020-06-26T11:40:10Z

@Wraith2 am I reading this correctly, that SqlClient supports a sync operation on a given reader instance while an async read is still in progress? If so, that's quite surprising...

It's messy. Officially no and there are checks in place that throw on direct calls of user methods in most places but because the internal api itself is all sync there's no real distinction between sync and async calls in some respects and you can end up in sutuations where async operations are forced into sync mode for various reasons. You could end up calling a public method on the reader that you might not expect to cause a read operation and it might not get protected.

By async state i'm refering to both the function state which would be hoist by the compiler and what is effectively a speculative copy of some of the parser state. Imagine as taking a snapshot and then running the parser forward from that state using incoming packets but then being able to just swap in the old state to return to the previous point as if the new packets hadn't been seen.

The problem I have with this approach is that speculation is always capable of mutating local state so you can't speculate without side effects. TryReadInt32 for example can (not always) mutate local state if the read crosses a packet boundary and it'll directly force a network read if it needs to read from a second packet. So calling a simple method to read an integer may mutate and may unavoidably cause and wait for IO.

I have toyed with the idea or trying to rewrite some parts of this with pipelines and spans but the scope of that project deterred me.

roji · 2020-06-26T11:47:57Z

You could end up calling a public method on the reader that you might not expect to cause a read operation and it might not get protected.

Can you give an example for this? Off the top of my head, I can't think of any operation on DbDataReader that users should expect to be invokable while any other async operation is still in progress... I'm guessing this may be a MARS-related thing, though I'd expect that there as well it would be the application's responsibility to never concurrently use the connection.

Stepping back, there seems to be a large amount of complexity here to support something which probably shouldn't be supported (concurrent usage of reader instances)... I'm sure there's a history/context here that I'm missing though.

Wraith2 · 2020-06-26T11:56:21Z

I don't think it's there to support concurrent access. I think there's there because the sync and async paths are so separate and yet use the same sync-based primitives. I've honestly never tried doing sync calls while async is in progress and wouldn't expect it to be allowed. Having worked through rationalizing the async calls I recognise that it is working hard to isolate current stable state from in-progress async operations.

roji · 2020-06-26T15:47:05Z

OK, thanks for the explanations @Wraith2. Sounds like a very complicated state of affairs for sure, I'm still not sure what exactly it's there the support... I'm not aware of any need in other drivers to "snapshot" state at the beginning of async operations, and to present that snapshotted state while the operation is in progress, etc. But I'm out of my depth here.

move datareader caches down to internal connection

f221b0a

cheenamalhotra added the Port to NetFx Port changes to NetFx label Mar 31, 2020

David-Engel approved these changes Apr 6, 2020

View reviewed changes

cheenamalhotra added this to the 2.1.0-preview1 milestone Jun 24, 2020

cheenamalhotra approved these changes Jun 25, 2020

View reviewed changes

cheenamalhotra removed the Port to NetFx Port changes to NetFx label Jun 25, 2020

cheenamalhotra added the Port to NetFx Port changes to NetFx label Jun 25, 2020

karinazhou self-assigned this Jul 9, 2020

karinazhou approved these changes Jul 16, 2020

View reviewed changes

cheenamalhotra merged commit 762d24f into dotnet:master Jul 16, 2020

cheenamalhotra mentioned this pull request Jul 16, 2020

perf: Rework ExecuteReaderAsync to minimize allocations #528

Merged

Wraith2 deleted the perf-movereadercaches branch July 16, 2020 07:38

cheenamalhotra mentioned this pull request Aug 19, 2021

Perf | Port PR #328 to improve performance in .NET Framework #1084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Move datareader caches down to internal connection #499

perf: Move datareader caches down to internal connection #499

Wraith2 commented Mar 30, 2020

Wraith2 commented Apr 1, 2020

David-Engel commented Apr 6, 2020

Wraith2 commented Apr 6, 2020 •

edited

Loading

Wraith2 commented Jun 2, 2020

cheenamalhotra commented Jun 24, 2020

Wraith2 commented Jun 24, 2020

cheenamalhotra commented Jun 25, 2020 •

edited

Loading

Wraith2 commented Jun 25, 2020

cheenamalhotra commented Jun 25, 2020

cheenamalhotra commented Jun 25, 2020

cheenamalhotra commented Jun 25, 2020

Wraith2 commented Jun 25, 2020

roji commented Jun 26, 2020

Wraith2 commented Jun 26, 2020

roji commented Jun 26, 2020

Wraith2 commented Jun 26, 2020

roji commented Jun 26, 2020

perf: Move datareader caches down to internal connection #499

perf: Move datareader caches down to internal connection #499

Conversation

Wraith2 commented Mar 30, 2020

Wraith2 commented Apr 1, 2020

David-Engel commented Apr 6, 2020

Wraith2 commented Apr 6, 2020 • edited Loading

Wraith2 commented Jun 2, 2020

cheenamalhotra commented Jun 24, 2020

Wraith2 commented Jun 24, 2020

cheenamalhotra commented Jun 25, 2020 • edited Loading

Wraith2 commented Jun 25, 2020

cheenamalhotra commented Jun 25, 2020

cheenamalhotra commented Jun 25, 2020

cheenamalhotra commented Jun 25, 2020

Wraith2 commented Jun 25, 2020

roji commented Jun 26, 2020

Wraith2 commented Jun 26, 2020

roji commented Jun 26, 2020

Wraith2 commented Jun 26, 2020

roji commented Jun 26, 2020

Wraith2 commented Apr 6, 2020 •

edited

Loading

cheenamalhotra commented Jun 25, 2020 •

edited

Loading