Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ProcessStartInfo.InheritHandles for Windows #107836

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AustinWise
Copy link
Contributor

Since ShellExecuteEx and CreateProcessWithLogonW don't expose a way to disable handles, an exception is thrown when InheritHandles is set to false and either UseShellExecute is true or UserName is set on ProcessStartInfo.

Contributes to #13943 (API approval is here in that issue).

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 15, 2024
@AustinWise AustinWise changed the title Implement ProcessStartInfo.InheritHandles Implement ProcessStartInfo.InheritHandlesfor Windows Sep 15, 2024
@AustinWise AustinWise changed the title Implement ProcessStartInfo.InheritHandlesfor Windows Implement ProcessStartInfo.InheritHandles for Windows Sep 15, 2024
Debug.Assert(attributeListSize > 0);
Debug.Assert(attributeListSize < 1024);

byte* newList = stackalloc byte[(int)attributeListSize];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: attributeListSize is 48 on my Windows 11 machine. Should I add a runtime check instead of just asserts here to make sure this not going to exhaust the stack?

@AustinWise AustinWise marked this pull request as ready for review September 16, 2024 04:48
@stephentoub
Copy link
Member

for Windows

Is the intent that the Unix implementation would come separately? I thought the plan there was to do what's called out in #13943 (comment) and explicitly walk through handles > 2 closing each?

Also set ownsHandle to false to prevent trying to close a potentially
invalid handle.
@AustinWise
Copy link
Contributor Author

for Windows

Is the intent that the Unix implementation would come separately? I thought the plan there was to do what's called out in #13943 (comment) and explicitly walk through handles > 2 closing each?

That was my intent: implement the approved Windows-only API and then follow up with the unix implementation. I moved a unit test out of the Windows-specific file to minimize the amount of churn this will cause.

@stephentoub
Copy link
Member

the approved Windows-only API

It wasn't approved as windows only, though, was it?

@AustinWise
Copy link
Contributor Author

It wasn't approved as windows only, though, was it?

The most recent API review I could find was from 2020, which has the setter labeled as Windows only:

#13943 (comment)

Jan Kotas said later in the thread:

It is not reasonably possible to implement this proposal on non-Windows. It is why the property is marked as Windows-specific.

Maybe I'm misunderstanding which proposal he is referring to (there are many ideas thrown around in that thread), but it seems clear that initially the new API is Windows specific.

@jkotas
Copy link
Member

jkotas commented Sep 17, 2024

IIRC, my concern was about the perf characteristics of the InheritHandles=false mode on non-Windows. I do not think there is a fast portable implementation possible. The fast implementation is possible only on a new-enough Linux kernel that has the close_range syscall or via other similar OS-specific mechanisms if they are available.

The idea was to use this API to also allow people to opt-in into a better perf/scalability of launching new processes on Windows: #31556 . Do you plan to implement this perf improvement as part of this PR?

If we were to implement this API on non-Windows, how are we going to document when this API helps perf and when it hurts perf?

@stephentoub
Copy link
Member

If we were to implement this API on non-Windows, how are we going to document when this API helps perf and when it hurts perf?

I've not thought of the API as being about perf, but rather about correctness, wanting to be able to launch child processes without worrying about what file descriptors the child might accidentally inherit and lead to problems in the parent. If it also helps with a performance issue on Windows around starting many processes, great, and if we can make it fast on newer versions of Linux, great. As long as we have a correct way to implement it on Unix in general, though, and I believe @tmds has suggested that the approach of manually closing each file descriptor > 2 is sound (Tom?), then that would seem to be appropriate.

@AustinWise
Copy link
Contributor Author

Do you plan to implement this perf improvement as part of this PR?

I think I see how the lock can be avoided without violating correctness or backwards compatibility. I would switch to using a reader-writer style lock. When InheritsHandles is true and streams are being redirected, the writer lock would be acquired so only one process can be created at a time. Otherwise, the reader side would be acquired so many threads could launch processes at once. I will try implementing this with ReaderWriterLockSlim and see what it does to performance.

@stephentoub
Copy link
Member

Do you plan to implement this perf improvement as part of this PR?

I think I see how the lock can be avoided without violating correctness or backwards compatibility. I would switch to using a reader-writer style lock. When InheritsHandles is true and streams are being redirected, the writer lock would be acquired so only one process can be created at a time. Otherwise, the reader side would be acquired so many threads could launch processes at once. I will try implementing this with ReaderWriterLockSlim and see what it does to performance.

For reference: #31556 (comment)

@AustinWise
Copy link
Contributor Author

I pushed an update that uses the reader writer lock. I have a benchmark in this repo, which I will duplicate below:

Code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Diagnostics;

BenchmarkRunner.Run<ProcessLaunchBenchmarks>(args: args);

public class ProcessLaunchBenchmarks
{
    const string ECHOED_OUTPUT = "hello world";

    [Params(1, 16, 256)]
    public int NumberOfProceses;

    [Benchmark]
    public void LaunchStandard()
    {
        Parallel.For(0, NumberOfProceses, i =>
        {
            var psi = new ProcessStartInfo("cmd.exe", "/c exit");

            using Process p = Process.Start(psi)!;
            p.WaitForExit();
        });
    }

    [Benchmark]
    public void LaunchWithRedirection()
    {
        Parallel.For(0, NumberOfProceses, i =>
        {
            var psi = new ProcessStartInfo("cmd.exe", $"/c echo {ECHOED_OUTPUT}");
            psi.RedirectStandardOutput = true;

            using Process p = Process.Start(psi)!;
            _ = p.StandardOutput.ReadToEnd();
            p.WaitForExit();
        });
    }
}

When I ran this on my 16 core machine , you can see that using the reader-writer lock add some overhead when launch a single process repeatedly. When concurrently launching larger number of processes, the reader lock allows process creation to scale better. When the writer side is needed (redirecting output while InheritHandles is true), the reader-writer lock performs a little worse than the regular lock.

Method NumberOfProceses Version Mean Error StdDev
LaunchStandard 1 .NET RC 1 5.474 ms 0.0927 ms 0.0867 ms
LaunchStandard 1 Pull Request 5.766 ms 0.0796 ms 0.0745 ms
LaunchStandard 16 .NET RC 1 30.802 ms 0.3729 ms 0.3488 ms
LaunchStandard 16 Pull Request 14.422 ms 0.1775 ms 0.1661 ms
LaunchStandard 256 .NET RC 1 469.180 ms 5.7967 ms 5.1387 ms
LaunchStandard 256 Pull Request 146.740 ms 3.5220 ms 10.2738 ms
LaunchWithRedirection 1 .NET RC 1 5.593 ms 0.1062 ms 0.1090 ms
LaunchWithRedirection 1 Pull Request 5.656 ms 0.1119 ms 0.1289 ms
LaunchWithRedirection 16 .NET RC 1 31.057 ms 0.2395 ms 0.2000 ms
LaunchWithRedirection 16 Pull Request 32.850 ms 0.6493 ms 0.6377 ms
LaunchWithRedirection 256 .NET RC 1 474.595 ms 5.7172 ms 5.3479 ms
LaunchWithRedirection 256 Pull Request 486.269 ms 8.6506 ms 7.2236 ms

I still need to write a benchmark with redirection and InheritHandles = false to ensure this scenarios receives the expected performance increase. Are there other scenarios that I should benchmark?

@tmds
Copy link
Member

tmds commented Sep 18, 2024

As long as we have a correct way to implement it on Unix in general, though, and I believe @tmds has suggested that the approach of manually closing each file descriptor > 2 is sound (Tom?), then that would seem to be appropriate.

Some numbers:

On my laptop (with the default config) max fd is 524288 (returned by getdtablesize).
Closing fds in a loop with close(int) takes 0.24s. This is a long time ...
Closing using the close_range syscall (Linux 5.9, Aug 2020) takes 5.3µs.

Since the original issue was created, many problems were solved on Unix by adopting CLOEXEC.
I think the remaining problems that InheritHandles helps address are on Windows.
It may be an acceptable limitations to only have this work on Windows.

@stephentoub
Copy link
Member

I think the remaining problems that InheritHandles helps address are on Windows.
It may be an acceptable limitations to only have this work on Windows.

Ok.

@am11
Copy link
Member

am11 commented Sep 18, 2024

Closing using the close_range syscall (Linux 5.9, Aug 2020) takes 5.3µs.

@tmds, (unrelated to this PR, but) is this something we can use elsewhere in runtime with, e.g., a cached wrapper in some of the places calling close(fd) or not worth the complexity (there are known tradeoffs etc.)?

@AustinWise
Copy link
Contributor Author

Looking at the documentation for ReaderWriterLockSlim:

A thread that tries to enter read mode blocks if there are threads waiting to enter write mode or if there is a single thread in write mode.

This means as long as there are threads waiting on the writer side of the lock (in this case: launching processes with redirection and InheritHandles = true), threads on the read side (no redirection) of the lock will never acquire the lock. This is a potential starvation issue. Is this an unacceptably bad performance problem and if it is, are there any other reader-writer style syncronization primitives available that try to prevent starvation of the reader side? I know of ConcurrentExclusiveSchedulerPair, but it's for async tasks.

@jkotas
Copy link
Member

jkotas commented Sep 18, 2024

are there any other reader-writer style syncronization primitives available that try to prevent starvation of the reader side?

The non-slim ReaderWriterLock does not have the starvation problem.

@AustinWise
Copy link
Contributor Author

AustinWise commented Sep 19, 2024

I updated to using the non-Slim ReaderWriterLock and reran my benchmark. Performance results are similar.

Method NumberOfProceses Version Mean Error StdDev
LaunchStandard 1 .NET 9 RC 1 6.157 ms 0.0873 ms 0.0817 ms
LaunchStandard 1 Pull Request 6.057 ms 0.0874 ms 0.0818 ms
LaunchStandard 16 .NET 9 RC 1 34.612 ms 0.1935 ms 0.1715 ms
LaunchStandard 16 Pull Request 14.736 ms 0.1387 ms 0.1298 ms
LaunchStandard 256 .NET 9 RC 1 490.711 ms 3.3823 ms 3.1638 ms
LaunchStandard 256 Pull Request 136.704 ms 2.6484 ms 2.6010 ms
LaunchWithRedirection 1 .NET 9 RC 1 6.213 ms 0.0862 ms 0.0806 ms
LaunchWithRedirection 1 Pull Request 6.088 ms 0.0956 ms 0.0894 ms
LaunchWithRedirection 16 .NET 9 RC 1 34.820 ms 0.1758 ms 0.1559 ms
LaunchWithRedirection 16 Pull Request 37.486 ms 0.3947 ms 0.3499 ms
LaunchWithRedirection 256 .NET 9 RC 1 495.615 ms 4.5181 ms 4.2263 ms
LaunchWithRedirection 256 Pull Request 513.196 ms 9.7459 ms 10.8325 ms

(I'm not sure why the .NET 9 RC 1 numbers are slightly different than before; it's the same machine, but I'm connected with Remote Desktop instead logged in interactively...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants