Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Directory.CreateDirectory performance #73942

Closed
wants to merge 2 commits into from

Conversation

stephentoub
Copy link
Member

This does a few things:

  • On all OSes, makes DirectoryInfo.Name and FileInfo.Name lazy. For a case like DirectoryInfo.CreateDirectory(fullPath) where the returned DirectoryInfo is ignored, previously we would still promptly allocate a string for the name, and now we won't, waiting instead until the property is accessed.
  • On Windows, after checking whether the target directory exists and finding it doesn't, rather than proceeding with the full logic to parse the path, find the first segment that exists, and then create them all, it adds a fast-path check to just try to create the target, under the assumption that creating a directory in a directory that already exists is very common. In such a case, it measurably speeds up throughput. If the parent doesn't already exist, this does add one syscall of overhead, and it can be measurable, but it's relatively small compared to the other costs involved on this slower path.
  • On Windows, on the slow path rather than allocating a List<string> to store the segments, it uses a ValueListBuilder<string> seeded with some stack space.
  • Fixes the nullable annotation on the internal DirectoryExists helper
  • Normalizes some parameter names

Contributes to #61954

This does a few things:
- On all OSes, makes DirectoryInfo.Name and FileInfo.Name lazy.  For a case like `DirectoryInfo.CreateDirectory(fullPath)` where the returned `DirectoryInfo` is ignored, previously we would still promptly allocate a `string` for the name, and now we won't, waiting instead until the property is accessed.
- On Windows, after checking whether the target directory exists and finding it doesn't, rather than proceeding with the full logic to parse the path, find the first segment that exists, and then create them all, it adds a fast-path check to just try to create the target, under the assumption that creating a directory in a directory that already exists is very common.  In such a case, it measurably speeds up throughput.  If the parent doesn't already exist, this does add one syscall of overhead, and it can be measurable, but it's relatively small compared to the other costs involved on this slower path.
- On Windows, on the slow path rather than allocating a `List<string>` to store the segments, it uses a `ValueListBuilder<string>` seeded with some stack space.
- Fixes the nullable annotation on the internal DirectoryExists helper
- Normalizes some parameter names
@stephentoub stephentoub added area-System.IO tenet-performance Performance related issue labels Aug 15, 2022
@stephentoub stephentoub added this to the 7.0.0 milestone Aug 15, 2022
@ghost ghost assigned stephentoub Aug 15, 2022
@ghost
Copy link

ghost commented Aug 15, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

This does a few things:

  • On all OSes, makes DirectoryInfo.Name and FileInfo.Name lazy. For a case like DirectoryInfo.CreateDirectory(fullPath) where the returned DirectoryInfo is ignored, previously we would still promptly allocate a string for the name, and now we won't, waiting instead until the property is accessed.
  • On Windows, after checking whether the target directory exists and finding it doesn't, rather than proceeding with the full logic to parse the path, find the first segment that exists, and then create them all, it adds a fast-path check to just try to create the target, under the assumption that creating a directory in a directory that already exists is very common. In such a case, it measurably speeds up throughput. If the parent doesn't already exist, this does add one syscall of overhead, and it can be measurable, but it's relatively small compared to the other costs involved on this slower path.
  • On Windows, on the slow path rather than allocating a List<string> to store the segments, it uses a ValueListBuilder<string> seeded with some stack space.
  • Fixes the nullable annotation on the internal DirectoryExists helper
  • Normalizes some parameter names

Contributes to #61954

Author: stephentoub
Assignees: -
Labels:

area-System.IO, tenet-performance

Milestone: 7.0.0

@stephentoub
Copy link
Member Author

@tmds, in #58799, how much of a regression did you see on Linux for CreateDirectory when the path already exists? I didn't see numbers for that in the PR nor the concerns about that regression addressed. I tried something similar on Windows, and opted against it as it ended up regressing that case by almost 2x, and I expect that's a reasonably common case.

@stephentoub
Copy link
Member Author

Method Toolchain Mean Error StdDev Ratio Allocated Alloc Ratio
Exists \main\corerun.exe 17.82 us 0.192 us 0.180 us 1.00 144 B 1.00
Exists \pr\corerun.exe 17.70 us 0.228 us 0.213 us 0.99 96 B 0.67
DoesntExist \main\corerun.exe 473.60 us 8.972 us 9.973 us 1.00 1056 B 1.00
DoesntExist \pr\corerun.exe 352.16 us 5.199 us 4.609 us 0.74 664 B 0.63
ParentDoesntExist \main\corerun.exe 766.22 us 7.961 us 6.647 us 1.00 1968 B 1.00
ParentDoesntExist \pr\corerun.exe 786.89 us 15.585 us 18.553 us 1.03 1560 B 0.79
private string _exists;
private string _doesntExist;
private string _doesntExistChild;

[GlobalSetup]
public void Setup()
{
    _exists = Path.GetTempFileName();
    File.Delete(_exists);
    Directory.CreateDirectory(_exists);

    _doesntExist = Path.GetRandomFileName();
    _doesntExistChild = Path.Combine(_doesntExist, Path.GetRandomFileName());
}

[GlobalCleanup]
public void Cleanup()
{
    Directory.Delete(_exists);
}

[Benchmark]
public void Exists()
{
    Directory.CreateDirectory(_exists);
}

[Benchmark]
public void DoesntExist()
{
    Directory.CreateDirectory(_doesntExist).Delete();
}

[Benchmark]
public void ParentDoesntExist()
{
    Directory.CreateDirectory(_doesntExistChild).Delete();
    Directory.Delete(_doesntExist);
}

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @stephentoub!


int lengthRoot = PathInternal.GetRootLength(fullPath.AsSpan());
bool somepathexists = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I know it was named like this before your changes

Suggested change
bool somepathexists = false;
bool somePathExists = false;

}

return name;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove whitespace

Suggested change

@stephentoub
Copy link
Member Author

This is hitting failures on pre-Windows 10 OSes due to some kind of device path validation issue. I'm going to close it for now. I separated out the FileInfo/DirectoryInfo.Name laziness part into #73983.

@tmds
Copy link
Member

tmds commented Aug 16, 2022

@tmds, in #58799, how much of a regression did you see on Linux for CreateDirectory when the path already exists? I didn't see numbers for that in the PR nor the concerns about that regression addressed. I tried something similar on Windows, and opted against it as it ended up regressing that case by almost 2x, and I expect that's a reasonably common case.

I had mentioned this regression in the initial comment. I expect it to be similar: 2x slower because it is making 2 syscalls instead of 1.

I opted to optimize the non-exists case at the cost of the exists case because that improves new directory creation from a loop. A loop should be written so it doesn't create the same directory over and over.

An Exists check can still be added by the user (and may already be part of the code).

@ghost ghost locked as resolved and limited conversation to collaborators Sep 15, 2022
@stephentoub stephentoub deleted the windowscreatedirectory branch June 29, 2023 14:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants