FileStream rewrite: Caching the ValueTaskSource in AsyncWindowsFileStreamStrategy #51363

carlossanlop · 2021-04-16T03:27:14Z

When AsyncWindowsFileStreamStrategy is wrapped by a BufferedFileStreamStrategy, we need to make sure the ValueTaskSource instance is cached to reduce the number of allocations when calling ReadAsync or WriteAsync multiple times in a row.

This PR is a continuation of #50802, in which we switched from using TaskCompletionSource to IValueTaskSource.

Changes:

Moved the PreAllocatedOverlapped instance inside ValueTaskSource, so the latter becomes its owner. This was done because we are only supposed to have an instance of a PreAllocatedOverlapped if the ValueTaskSource was created from OnBufferedAllocated, which is a method called only by BufferedFileStreamStrategy right before writing or reading.
Removed MemoryValueTaskSource and moved the cases handled by it to ValueTaskSource.
Created a method that refreshes the value of the NativeOverlapped*. This is done every time we call ReadAsync/WriteAsync, to make sure we are pinning the memory passed by the user.

ghost · 2021-04-16T03:27:28Z

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #50972

DAFT: Pending benchmarks.

When AsyncWindowsFileStreamStrategy is wrapped by a BufferedFileStreamStrategy, we need to make sure the ValueTaskSource instance is cached to reduce the number of allocations when calling ReadAsync or WriteAsync multiple times in a row.

This PR is a continuation of #50802, in which we switched from using TaskCompletionSource to IValueTaskSource.

Changes:

Moved the PreAllocatedOverlapped instance inside ValueTaskSource, so the latter becomes its owner. This was done because we are only supposed to have an instance of a PreAllocatedOverlapped if the ValueTaskSource was created from OnBufferedAllocated, which is a method called only by BufferedFileStreamStrategy right before writing or reading.
Removed MemoryValueTaskSource and moved the cases handled by it to ValueTaskSource.
Created a method that refreshes the value of the NativeOverlapped*. This is done every time we call ReadAsync/WriteAsync, to make sure we are pinning the memory passed by the user.

Author:	carlossanlop
Assignees:	carlossanlop
Labels:	`area-System.IO`
Milestone:	6.0.0

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs

adamsitnik · 2021-04-16T09:06:39Z

Initial benchmark results (base is #50802):

Method	Toolchain	fileSize	userBufferSize	options	Mean	Ratio	Gen 0	Allocated
ReadAsync	\cache\corerun.exe	1024	1024	Asynchronous	83.99 us	1.02	0.3360	5 KB
ReadAsync	\base\corerun.exe	1024	1024	Asynchronous	81.98 us	1.00	0.6545	5 KB

WriteAsync	\cache\corerun.exe	1024	1024	Asynchronous	487.70 us	1.02	-	5 KB
WriteAsync	\true\corerun.exe	1024	1024	Asynchronous	476.89 us	1.00	-	5 KB

ReadAsync	\cache\corerun.exe	1048576	512	Asynchronous	2,649.16 us	1.09	10.4167	127 KB
ReadAsync	\true\corerun.exe	1048576	512	Asynchronous	2,421.94 us	1.00	8.9286	83 KB

WriteAsync	\cache\corerun.exe	1048576	512	Asynchronous	4,527.33 us	1.11	15.6250	119 KB
WriteAsync	\true\corerun.exe	1048576	512	Asynchronous	4,100.58 us	1.00	-	75 KB

ReadAsync	\cache\corerun.exe	1048576	4096	Asynchronous	2,321.45 us	1.01	8.9286	71 KB
ReadAsync	\true\corerun.exe	1048576	4096	Asynchronous	2,308.26 us	1.00	8.9286	69 KB

WriteAsync	\cache\corerun.exe	1048576	4096	Asynchronous	4,191.59 us	1.03	-	97 KB
WriteAsync	\true\corerun.exe	1048576	4096	Asynchronous	4,077.05 us	1.00	-	74 KB

ReadAsync_NoBuffering	\cache\corerun.exe	1048576	16384	Asynchronous	730.67 us	1.00	-	18 KB
ReadAsync_NoBuffering	\true\corerun.exe	1048576	16384	Asynchronous	732.58 us	1.00	-	17 KB

WriteAsync_NoBuffering	\cache\corerun.exe	1048576	16384	Asynchronous	2,747.11 us	0.99	-	18 KB
WriteAsync_NoBuffering	\true\corerun.exe	1048576	16384	Asynchronous	2,787.66 us	1.00	-	17 KB

ReadAsync	\cache\corerun.exe	104857600	4096	Asynchronous	250,998.78 us	1.01	-	7,001 KB
ReadAsync	\true\corerun.exe	104857600	4096	Asynchronous	249,280.08 us	1.00	-	6,801 KB

WriteAsync	\cache\corerun.exe	104857600	4096	Asynchronous	382,734.73 us	1.04	1000.0000	9,205 KB
WriteAsync	\true\corerun.exe	104857600	4096	Asynchronous	368,403.91 us	1.00	-	6,905 KB

ReadAsync_NoBuffering	\cache\corerun.exe	104857600	16384	Asynchronous	80,072.69 us	0.97	-	1,750 KB
ReadAsync_NoBuffering	\true\corerun.exe	104857600	16384	Asynchronous	82,376.92 us	1.00	-	1,700 KB

WriteAsync_NoBuffering	\cache\corerun.exe	104857600	16384	Asynchronous	120,385.98 us	0.98	-	1,751 KB
WriteAsync_NoBuffering	\true\corerun.exe	104857600	16384	Asynchronous	123,215.33 us	1.00	-	1,701 KB

it looks like we are allocating less when buffering is disabled, but more than before when it's enabled

adamsitnik

Overall looks good to me, but we need to track and solve the allocation regression which is visible in the scenarios where buffering is enabled

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs

…to the moment after _source.SetException|SetResult is called

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs

adamsitnik · 2021-04-16T16:24:01Z

@stephentoub your suggestions were great!

Method	Job	fileSize	userBufferSize	options	Mean	Ratio	Gen 0	Allocated
ReadAsync	after	1024	1024	Asynchronous	87.48 us	1.04	0.3613	5,240 B
ReadAsync	before	1024	1024	Asynchronous	84.38 us	1.00	0.3472	5,216 B

WriteAsync	after	1024	1024	Asynchronous	494.43 us	0.99	-	4,960 B
WriteAsync	before	1024	1024	Asynchronous	501.52 us	1.00	-	4,936 B

CopyToFileAsync	after	1024	?	None	513.31 us	1.01	-	5,593 B
CopyToFileAsync	before	1024	?	None	510.34 us	1.00	-	5,593 B

CopyToFileAsync	after	1024	?	Asynchronous	539.68 us	1.00	-	6,336 B
CopyToFileAsync	before	1024	?	Asynchronous	541.15 us	1.00	-	6,320 B

ReadAsync	after	1048576	512	Asynchronous	2,436.66 us	0.98	-	58,279 B
ReadAsync	before	1048576	512	Asynchronous	2,480.83 us	1.00	8.9286	84,775 B

WriteAsync	after	1048576	512	Asynchronous	4,142.67 us	0.92	-	50,029 B
WriteAsync	before	1048576	512	Asynchronous	4,511.11 us	1.00	-	76,527 B

ReadAsync	after	1048576	4096	Asynchronous	2,153.31 us	0.90	-	916 B
ReadAsync	before	1048576	4096	Asynchronous	2,397.80 us	1.00	8.9286	70,241 B

WriteAsync	after	1048576	4096	Asynchronous	3,976.06 us	0.97	-	27,596 B
WriteAsync	before	1048576	4096	Asynchronous	4,100.25 us	1.00	-	75,562 B

ReadAsync_NoBuffering	after	1048576	16384	Asynchronous	689.04 us	0.94	-	760 B
ReadAsync_NoBuffering	before	1048576	16384	Asynchronous	733.21 us	1.00	-	17,864 B

WriteAsync_NoBuffering	after	1048576	16384	Asynchronous	2,781.44 us	0.99	-	762 B
WriteAsync_NoBuffering	before	1048576	16384	Asynchronous	2,808.65 us	1.00	-	17,866 B

CopyToFileAsync	after	1048576	?	None	2,360.16 us	0.98	-	3,244 B
CopyToFileAsync	before	1048576	?	None	2,405.43 us	1.00	-	3,244 B

CopyToFileAsync	after	1048576	?	Asynchronous	2,593.22 us	1.01	-	2,044 B
CopyToFileAsync	before	1048576	?	Asynchronous	2,568.19 us	1.00	-	3,924 B

ReadAsync	after	104857600	4096	Asynchronous	247,853.42 us	0.95	-	1,368 B
ReadAsync	before	104857600	4096	Asynchronous	259,775.67 us	1.00	-	6,963,952 B

WriteAsync	after	104857600	4096	Asynchronous	358,502.58 us	0.97	-	2,258,912 B
WriteAsync	before	104857600	4096	Asynchronous	371,005.92 us	1.00	-	7,070,648 B

ReadAsync_NoBuffering	after	104857600	16384	Asynchronous	76,073.40 us	0.94	-	796 B
ReadAsync_NoBuffering	before	104857600	16384	Asynchronous	81,316.51 us	1.00	-	1,741,292 B

WriteAsync_NoBuffering	after	104857600	16384	Asynchronous	114,458.43 us	0.88	-	832 B
WriteAsync_NoBuffering	before	104857600	16384	Asynchronous	130,608.83 us	1.00	-	1,741,328 B

CopyToFileAsync	after	104857600	?	None	74,758.01 us	1.01	-	180,828 B
CopyToFileAsync	before	104857600	?	None	73,939.22 us	1.00	-	180,828 B

CopyToFileAsync	after	104857600	?	Asynchronous	86,110.81 us	0.98	-	2,220 B
CopyToFileAsync	before	104857600	?	Asynchronous	87,546.69 us	1.00	-	219,524 B

adamsitnik · 2021-04-16T16:32:32Z

@stephentoub I believe I have addressed all your feedback, PTAL one more time. I hope that we can merge it today and include it in Preview 4

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs

adamsitnik · 2021-04-16T18:41:31Z

@stephentoub we have addressed the feedback, please take a look. I am going to post the benchmark results in 20-30 minutes

stephentoub · 2021-04-16T18:53:39Z

This now also fixes #25074

adamsitnik · 2021-04-16T19:31:29Z

The results (see the Allocated column)

Method	Job	fileSize	userBufferSize	options	Mean	Ratio	Allocated
ReadAsync	after	1024	1024	Asynchronous	84.39 us	0.98	5,240 B
ReadAsync	before	1024	1024	Asynchronous	85.86 us	1.00	5,216 B

WriteAsync	after	1024	1024	Asynchronous	483.68 us	1.01	4,960 B
WriteAsync	before	1024	1024	Asynchronous	478.92 us	1.00	4,936 B

CopyToFileAsync	after	1024	?	None	492.53 us	1.01	5,593 B
CopyToFileAsync	before	1024	?	None	489.18 us	1.00	5,593 B

CopyToFileAsync	after	1024	?	Asynchronous	529.35 us	1.02	6,336 B
CopyToFileAsync	before	1024	?	Asynchronous	521.85 us	1.00	6,311 B

ReadAsync	after	1048576	512	Asynchronous	2,371.68 us	1.00	58,279 B
ReadAsync	before	1048576	512	Asynchronous	2,372.78 us	1.00	84,775 B

WriteAsync	after	1048576	512	Asynchronous	4,081.33 us	0.97	50,028 B
WriteAsync	before	1048576	512	Asynchronous	4,214.84 us	1.00	76,517 B

ReadAsync	after	1048576	4096	Asynchronous	2,138.36 us	0.92	913 B
ReadAsync	before	1048576	4096	Asynchronous	2,332.66 us	1.00	70,241 B

WriteAsync	after	1048576	4096	Asynchronous	3,951.54 us	0.95	27,562 B
WriteAsync	before	1048576	4096	Asynchronous	4,153.60 us	1.00	75,562 B

ReadAsync_NoBuffering	after	1048576	16384	Asynchronous	674.58 us	0.91	760 B
ReadAsync_NoBuffering	before	1048576	16384	Asynchronous	740.07 us	1.00	17,864 B

WriteAsync_NoBuffering	after	1048576	16384	Asynchronous	2,711.34 us	0.99	762 B
WriteAsync_NoBuffering	before	1048576	16384	Asynchronous	2,736.59 us	1.00	17,866 B

CopyToFileAsync	after	1048576	?	None	1,961.68 us	0.95	3,243 B
CopyToFileAsync	before	1048576	?	None	2,080.32 us	1.00	3,244 B

CopyToFileAsync	after	1048576	?	Asynchronous	2,283.20 us	1.08	2,044 B
CopyToFileAsync	before	1048576	?	Asynchronous	2,161.78 us	1.00	3,924 B

ReadAsync	after	104857600	4096	Asynchronous	228,777.04 us	0.93	1,056 B
ReadAsync	before	104857600	4096	Asynchronous	245,615.52 us	1.00	6,963,952 B

WriteAsync	after	104857600	4096	Asynchronous	353,822.32 us	0.96	2,257,976 B
WriteAsync	before	104857600	4096	Asynchronous	370,913.51 us	1.00	7,070,648 B

ReadAsync_NoBuffering	after	104857600	16384	Asynchronous	74,983.01 us	0.95	796 B
ReadAsync_NoBuffering	before	104857600	16384	Asynchronous	79,064.04 us	1.00	1,741,292 B

WriteAsync_NoBuffering	after	104857600	16384	Asynchronous	116,660.24 us	0.95	832 B
WriteAsync_NoBuffering	before	104857600	16384	Asynchronous	123,323.67 us	1.00	1,741,328 B

CopyToFileAsync	after	104857600	?	None	73,465.73 us	0.99	180,828 B
CopyToFileAsync	before	104857600	?	None	74,040.22 us	1.00	180,828 B

CopyToFileAsync	after	104857600	?	Asynchronous	85,207.68 us	0.98	2,220 B
CopyToFileAsync	before	104857600	?	Asynchronous	87,037.81 us	1.00	219,524 B

jeffhandley · 2021-04-16T19:39:13Z

Wowza some of those allocation improvements are incredible!

danmoseley · 2021-04-16T23:17:39Z

Nice.

jkotas · 2021-04-16T23:21:24Z

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs

@@ -1067,7 +1069,8 @@ private void EnsureBufferAllocated()

            void AllocateBuffer() // logic kept in a separate method to get EnsureBufferAllocated() inlined
            {
-                _strategy.OnBufferAllocated(_buffer = new byte[_bufferSize]);
+                _buffer = GC.AllocateUninitializedArray<byte>(_bufferSize,
+                    pinned: true); // this allows us to avoid pinning when the buffer is used for the syscalls


That correct, but pinned: true also allocates the array in Gen2 as side-effect so this may actually hurt real-world scenarios at the end..

We can allocate it and use a GCHandle if that ends up being better. Previously it was pinned as part of a PreallocatedOverlapped.

(It's still not at all obvious when this newfangled POH should be used. )

We can also investigate not pinning at all, in which case the code this interacts with will just create a gchandle for each operation.

And/or look at using a pool array, but we'd want to ensure enough synchronization was in place to minimize erroneous usage causing us to return an array still in use. We do that in a few other streams.

Here is a simple test:

using System; using System.IO; for (int i = 0; i < 100_000; i++) { using (var f = new FileStream("test", FileMode.Create)) { f.WriteByte(42); } } Console.WriteLine($"Allocated: {GC.GetTotalAllocatedBytes()} Gen2 GCs: {GC.CollectionCount(2)}");

.NET 5: Allocated: 442474096 Gen2 GCs: 0

This PR: Allocated: 448051624 Gen2 GCs: 103

It will be interesting to see whether these excessive Gen2 GCs hit performance gates of services trying .NET 6 previews.

It's still not at all obvious when this newfangled POH should be used.

Agree. It is very hard to use.

we'd want to ensure enough synchronization was in place to minimize erroneous usage causing us to return an array still in use

If you can cover all these cases, it may be better to use unmanaged buffer. It is pinned too, and it does not cause excessive Gen2 GCs.

So allocating on the POH contributes to the gen2 budget. This is why we disable the buffer using size 1, that still works right?

Nothing in this file is used at all if buffer size is 1.

I'm looking forward to taking another stab at optimizing static files in .NET 6

If you can cover all these cases, it may be better to use unmanaged buffer. It is pinned too, and it does not cause excessive Gen2 GCs.

I'm going to start with a GCHandle and a normally allocated array. I believe in that case I can mostly restrict synchronization to the async code paths (plus disposal). If we use a native buffer, we'll need to protect the sync code paths as well. We can start with this and then see if it makes sense to use a pooled or native buffer as well.

Well, actually, I'm going to start by not pinning here at all (it'll then pin/unpin in the rest of the implementation per operation). If there's no measurable impact, we can stick with that for now.

carlossanlop added 2 commits April 15, 2021 20:13

Caching the ValueTaskSource in AsyncWindowsFileStreamStrategy

787fc8b

Add debug assert to WriteByteSlow for _writePos == _bufferSize

6de3827

carlossanlop added the area-System.IO label Apr 16, 2021

carlossanlop added this to the 6.0.0 milestone Apr 16, 2021

carlossanlop requested review from adamsitnik and jozkee April 16, 2021 03:27

carlossanlop self-assigned this Apr 16, 2021

carlossanlop requested a review from jeffhandley April 16, 2021 03:27

carlossanlop commented Apr 16, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs Outdated Show resolved Hide resolved

Fix assertion failure

f991eaa

adamsitnik reviewed Apr 16, 2021

View reviewed changes

Apply suggestions from code review

efb426b

adamsitnik reviewed Apr 16, 2021

View reviewed changes

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs Outdated Show resolved Hide resolved

adamsitnik reviewed Apr 16, 2021

View reviewed changes

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs Outdated Show resolved Hide resolved

adamsitnik added 5 commits April 16, 2021 11:39

Apply suggestions from code review

45f13e2

some pedantic polishing

38b1431

fix the allocation bug: actually return the instance to the "pool"

a9b314f

don't Dipose the handle if it has default value

a8011a6

reset _result and _source in the Configure method

fe95056

stephentoub reviewed Apr 16, 2021

View reviewed changes

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs Outdated Show resolved Hide resolved

delay the moment when the ValueTaskSource becomes ready to be reused …

d1ebc47

…to the moment after _source.SetException|SetResult is called

stephentoub reviewed Apr 16, 2021

View reviewed changes

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs Outdated Show resolved Hide resolved

stephentoub reviewed Apr 16, 2021

View reviewed changes

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs Outdated Show resolved Hide resolved

stephentoub reviewed Apr 16, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs Outdated Show resolved Hide resolved

stephentoub reviewed Apr 16, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs Outdated Show resolved Hide resolved

stephentoub reviewed Apr 16, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs Outdated Show resolved Hide resolved

stephentoub reviewed Apr 16, 2021

View reviewed changes

...m.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.ValueTaskSource.cs Outdated Show resolved Hide resolved

adamsitnik added 2 commits April 16, 2021 17:23

apply Stephen suggestions

4adf096

implement Stephen idea

d101f45

adamsitnik added 2 commits April 16, 2021 18:25

use Reset in explicit way

d4deefc

remove outdated comment

af10161

adamsitnik marked this pull request as ready for review April 16, 2021 16:30

adamsitnik requested a review from stephentoub April 16, 2021 16:31