Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinaryReader.PeekChar causes excessive reads in underlying Stream if used with BufferedStream #54593

Closed
lekrus opened this issue Jun 23, 2021 · 4 comments · Fixed by #54991
Closed
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@lekrus
Copy link

lekrus commented Jun 23, 2021

Description

When BinaryReader.PeekChar is called multiple times for BufferedStream it will cause read of full buffer with each call.
Sample reproduce:

public class CustomStream : Stream {
	private readonly FileStream _fileStream;
	public CustomStream(FileStream fileStream) {
		_fileStream = fileStream;
	}
	public override int Read(byte[] buffer, int offset, int count) {
		Console.WriteLine("Read: count: {0}, pos: {1}, len: {2}, off: {3}", count, Position, Length, offset);
		return _fileStream.Read(buffer, offset, count);
	}
// other methods are skipped for brevity
}

static void Main(string[] args) {
	using var input = new BinaryReader(new BufferedStream(new CustomStream(File.Open("test.rtf", FileMode.Open))));
	
	for (int i = 0; i < 3; i++) {
		input.PeekChar();
	}
}

Running that code will output:

Read: count: 4096, pos: 0, len: 55355, off: 0
Read: count: 4096, pos: 0, len: 55355, off: 0
Read: count: 4096, pos: 0, len: 55355, off: 0

It shows that underlying stream is called to read full buffer (in our case 4096 is a default buffer, but with bigger buffer size it becomes even more prominent)

Analysis

From what I see, the issue occurs when PeekChar restores it's position:

long origPos = _stream.Position;
int ch = Read();
_stream.Position = origPos;

and setting Position in BufferedStream resets data in read buffer:

if (_writePos > 0)
FlushWrite();

_readPos = 0;
_readLen = 0; // <-- resetting read buffer
_stream!.Seek(value, SeekOrigin.Begin);
@lekrus lekrus added the tenet-performance Performance related issue label Jun 23, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Jun 23, 2021
@ghost
Copy link

ghost commented Jun 23, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When BinaryReader.PeekChar is called multiple times for BufferedStream it will cause read of full buffer with each call.
Sample reproduce:

public class CustomStream : Stream {
	private readonly FileStream _fileStream;
	public CustomStream(FileStream fileStream) {
		_fileStream = fileStream;
	}
	public override int Read(byte[] buffer, int offset, int count) {
		Console.WriteLine("Read: count: {0}, pos: {1}, len: {2}, off: {3}", count, Position, Length, offset);
		return _fileStream.Read(buffer, offset, count);
	}
// other methods are skipped for brevity
}

static void Main(string[] args) {
	using var input = new BinaryReader(new BufferedStream(new CustomStream(File.Open("test.rtf", FileMode.Open))));
	
	for (int i = 0; i < 3; i++) {
		input.PeekChar();
	}
}

Running that code will output:

Read: count: 4096, pos: 0, len: 55355, off: 0
Read: count: 4096, pos: 0, len: 55355, off: 0
Read: count: 4096, pos: 0, len: 55355, off: 0

It shows that underlying stream is called to read full buffer (in our case 4096 is a default buffer, but with bigger buffer size it becomes even more prominent)

Analysis

From what I see, the issue occurs when PeekChar restores it's position:

long origPos = _stream.Position;
int ch = Read();
_stream.Position = origPos;

and setting Position in BufferedStream resets data in read buffer:

if (_writePos > 0)
FlushWrite();

_readPos = 0;
_readLen = 0; // <-- resetting read buffer
_stream!.Seek(value, SeekOrigin.Begin);
Author: lekrus
Assignees: -
Labels:

area-System.IO, tenet-performance, untriaged

Milestone: -

@adamsitnik adamsitnik added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jun 25, 2021
@adamsitnik adamsitnik added this to the Future milestone Jun 25, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 1, 2021
@lateapexearlyspeed
Copy link
Contributor

Seems the point is to try to keep shared buffer data valid as possible even after set_position within some reasonable range. I would like to learn and try it with PR, thanks !

@lateapexearlyspeed
Copy link
Contributor

Hi, this PR is ready for review now, please review, thanks.

Description:

  1. Improve set_posittion: Make buffer data valid if set_posittion inside range of current buffer
  2. Refactor existing implementation of Seek() so that Seek() and set_position can use new common logic (I feel new logic can cover existing one and simplify code. If not suitable, we can revert this part2)
  3. Add test case which covers both set_position and Seek to verify Reading after changing position will not trigger underlying stream's Read while keep other operations correct.

@lateapexearlyspeed
Copy link
Contributor

Hi @adamsitnik, could you please help review ?

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 11, 2021
@adamsitnik adamsitnik modified the milestones: Future, 6.0.0 Aug 11, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants