Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed API for canonical paths #23871

Open
Tracked by #64596
carlreinke opened this issue Oct 17, 2017 · 8 comments
Open
Tracked by #64596

Proposed API for canonical paths #23871

carlreinke opened this issue Oct 17, 2017 · 8 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@carlreinke
Copy link
Contributor

Rationale

"It seems we need a new API that allows us to get the canonical path that hits the disk and follows links and fixes casing" —API Review 2015-08-27

Proposed API

public static class Path
{
    public static string GetCanonicalPath( string path );  // Resolves symbolic links
    public static string GetCanonicalPath( string path, bool preserveSymbolicLinks );
}

Details

A canonical path

  • is a fully qualified path path,
  • uses only DirectorySeparatorChar as a directory separator,
  • has no trailing directory separator (unless it is a/the root),
  • contains no navigation elements (. and ..) and no empty path elements (ex. /foo//bar),
  • contains no symbolic links (unless otherwise specified),
  • has its root in a canonical form (ex. on Windows, drive letters will be upper case), and
  • adopts the actual casing of the file and directory names if the file and directory names are case-insensitive.

The behavior of .. with respect to symbolic links is platform dependent. On Windows, it removes the symbolic link element. On Linux, it removes an element of path that was resolved from the symbolic link.

Examples:

  • Windows: If C:\Users\All Users is a symbolic link to C:\ProgramData then the canonical form of C:\Users\All Users\..\foo is C:\Users\foo.
  • Linux: If /var/lock is a symbolic link to /run/lock then the canonical form of /var/lock/../foo is /run/foo.

On Linux it is not always possible to preserve symbolic links while simplifying ... In these cases Path.GetCanonicalPath(string, bool) will resolve the symbolic link regardless of the value of preserveSymbolicLinks.

For the purposes of this API, NTFS Junction Points are considered to be like Linux bind mounts and are not considered to be symbolic links.

Since most of the members of Path do not do I/O there might be some objection to adding GetCanonicalPath to Path. There is at least one existing method that does do I/O — Path.GetTempFileName — so it would not be the first.

Open Questions

  • How should GetCanonicalPath handle paths in \\?\ and \\.\?
    • Maybe return a \\?\ path if and only if the input path was in \\?\.
    • Maybe reject paths in \\.\.
  • What should GetCanonicalPath do if the path does not exist?
    • Maybe canonicalize the existing part and append the rest as canonicalized as possible.
  • Does Linux provide a mechanism for determining the actual casing of a file or directory name for files and directories that reside on a file system that is mounted in a case-insensitive mode? If it doesn't then the provided casing will have to be returned.

Related Issues

See Also

Updates

  • Replace SymbolicLinkOption with bool.
  • Split out dotnet/corefx#25569
@carlreinke
Copy link
Contributor Author

Should I expect to get some feedback so that I can get this out of api-needs-work?

@carlreinke
Copy link
Contributor Author

cc: @JeremyKuhne

@JeremyKuhne
Copy link
Member

@carlreinke Sorry, missed this one. Can you please separate the canonical path API into a different issue? It will be hard to track the discussion otherwise. Both topics are of interest and I'll comment in detail once it is broken out.

@carlreinke carlreinke changed the title Proposed API for canonical paths and symbolic links Proposed API for canonical paths Nov 29, 2017
@carlreinke
Copy link
Contributor Author

carlreinke commented Nov 29, 2017

@JeremyKuhne Done. (Though I split them opposite of how you suggested.)

@JeremyKuhne
Copy link
Member

Thanks for your work here!

We need to call out explicit usage cases to make sure we've got the design right. What problem is this intended to solve?

For this API in particular it is super important we're as detailed as possible. A problematic API, for example, is Path.IsPathRooted(). Many took this to mean "not relative", which it isn't. (I introduced Path.IsPathFullyQualified() to deal with this one.) I fear people will take the results of this to compare equality of inputs- they'll ask "are these two strings the same file?", which this can't answer easily (if at all) for a number of reasons:

  1. File names are many-to-one lookups to entries on disk. When you have multiple entries (hard links), there is no "canonical" one to pick.
  2. Mapped drives make computing a "canonical" path difficult. UNC's make this even harder.
  3. Volumes do not need to be "mounted" to a drive letter on Windows, and they can have any number of aliases in said DOS device namespace. We want to support that, but giving back the canonical version of the volume is not easily understood or easy to pick the "right" answer. (Should it be the guid alias, or the actual device path? Using the guid seems to make sense at first, but could have privacy implications.)

One thing we might want to consider is extending GetFullPath() with options. Things like:

  • FullPathOptions.NormalizeCasing
  • FullPathOptions.FollowSymbolicLinks
  • FullPathOptions.ExpandShortNames (Another backlog item of mine- I want to kill this as default behavior as it is slow and isn't 100% accurate. As an aside- Is it possible to get at 8.3 names from FAT/NTFS mounted on Unix?)

Would that address most of the scenarios? It is easier to explain and doesn't infer "identity" like canonicalize does. This does, however, have some problems (which are not unique to using GetFullPath):

  1. What about path segments that don't exist? (as you've brought up)
  2. What about access denied for segments of the path? (haven't looked in depth at what rights are involved in grabbing filenames and following links)

How should GetCanonicalPath handle paths in \\?\ and \\.\?

For \\?\ and \\.\ we can leave them with either header or perhaps always give back '\?'.

Linux: If /var/lock is a symbolic link to /run/lock then the canonical form of /var/lock/../foo is /run/foo.

Is that true? What resolves an input like that? GetFullPath() would be wrong if this is the case.

Normalizing casing requires walking the input path segments afaik. Expensive, but I don't know of any way around that.

One Windows API to look at to help inform this work is GetFinalPathNameByHandle.

@carlreinke
Copy link
Contributor Author

carlreinke commented Nov 30, 2017

Is that true? What resolves an input like that? GetFullPath() would be wrong if this is the case.

Sure! Fire up WSL and give it a try. :)

user@roxy:~$ head -n 1 /var/lock/../resolvconf/resolv.conf /run/resolvconf/resolv.conf /var/resolvconf/resolv.conf
==> /var/lock/../resolvconf/resolv.conf <==
# This file was automatically generated by WSL. To stop automatic generation of this file, remove this line.

==> /run/resolvconf/resolv.conf <==
# This file was automatically generated by WSL. To stop automatic generation of this file, remove this line.
head: cannot open '/var/resolvconf/resolv.conf' for reading: No such file or directory
user@roxy:~$

(Behavior is the same on Ubuntu, so it's not just a quirk of WSL.)

@JeremyKuhne
Copy link
Member

Sure! Fire up WSL and give it a try. :)

Hmm- I need to set aside some time to fully understand this behavior. Windows claims to have implemented it's symbolic links just like Unix, I can't wrap my head around how this would manifest in Win32. To be honest, though, given the prior need for elevation my usage of symlinks has been pretty basic. :)

@pjanotti pjanotti removed their assignment Oct 8, 2018
@JeremyKuhne JeremyKuhne removed their assignment Jan 22, 2020
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Sep 2, 2024
@jnm2
Copy link
Contributor

jnm2 commented Sep 2, 2024

Please consider for .NET 10!

@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

6 participants