Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.path.subpath.{isValid,normalise}: init #205190

Merged
merged 6 commits into from
Jan 3, 2023
Merged

Conversation

infinisil
Copy link
Member

@infinisil infinisil commented Dec 8, 2022

Description of changes

Creates a new lib.path library component, originally proposed in #200718. This commit adds two main parts of it:

  • The design document covering the main design decisions for this library
  • A lib.path.subpath.isValid function to check whether a value is a valid subpath according to the design document
  • A lib.path.subpath.normalise function implementing a safe normalisation of subpaths according to the design document.

In the future further library functions will be implemented upon these two functions, it's the working horse of this path library. The next functions from #200718 that can be implemented are subpath.join and append, see #210426 for progress.

This work is sponsored by Antithesis

Things done
  • Wrote documentation
  • Wrote code-covering tests
  • Wrote property tests

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix a crucial term.

lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
@infinisil
Copy link
Member Author

In #200718 we originally wanted to use path.relative.normalise, but I used path.relativeNormalise because the doc rendering didn't work with path.relative.normalise, it associated the docs with path.relative and not path.relative.normalise. I created a PR to fix this in nixdoc: nix-community/nixdoc#27, so I'll switch this PR back to relative.normalise as originally intended and I'll add this patch to the nixdoc build.

Copy link
Contributor

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the cleanest, most self-explaining code I have seen in months. Only lots of nitpicks about phrasing in the docs.

I added a few suggestions which add periods to the argument lines, but maybe this is a bad idea. Specifically it makes those lines which end in ./. and the like harder to read, and I don't think it makes sense to reword those artificially. Feel free to amend them and remove the periods if you don't agree, or make the change consistent otherwise.

Most importantly I think we should be careful not to overdo the subpath thing. We can keep it generic where we can be sure it will stay generic. That would allow for minimal changes in case we ever need to extend the library towards actual relative paths in a safe, backwards-compatible way.

lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path-design.md Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
Copy link
Contributor

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't even know if we need a relative normalisation yet. Personally I don't think we'll ever need it.

That's an argument in favor of not making that distinction in the name and go for path.relative.normalise as we originally intended. We can still say in lib.path.append that it takes a subpath as a second argument, as defined in the design document. If we should ever need it, relaxing relative paths and splitting out a subpath variant will not break existing code, and can be supported by warnings to consumers.

@roberth please review. Other than the last open naming question this look fine to merge to me.

lib/path.md Outdated Show resolved Hide resolved
lib/path.md Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated
Comment on lines 43 to 45
# Splits and normalises a subpath string into its components.
# Errors for ".." components and doesn't include "." components
splitSubpath = path: errorPrefix:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Splits and normalises a subpath string into its components.
# Errors for ".." components and doesn't include "." components
splitSubpath = path: errorPrefix:
# Split and normalise a relative path into its components, skipping "." components.
split = path:

There is nothing specific to subpaths left here. Since this function is internal we may as well use a short name.

Copy link
Member Author

@infinisil infinisil Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd run into infinite recursion with split though, because we use inherit (builtins) split and use it in this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitRelPath and joinRelPath?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL I can copy html and github will accept that as markdown.

rel m (plural rellen, diminutive relletje n)

  1. riot
  2. scandal, outrage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh neat. Also similarly: You can select text from a GitHub comment, then click "Quote Reply" and it will insert only the selected text as a quote.

This is an internal name anyways, we don't need to care about the name that much imo :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we use inherit (builtins) split and use it in this function

You can rename the import to splitString! Okay, this is getting into a level of detail that suggests the implementation is good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah "Quote Reply" is also nice.

internal

Is it a scandal if no-one knows?

I guess what I was going to say is that "relative" isn't quite right here.

because we use inherit (builtins) split

We don't have to.

I guess the risk is someone might want to expose the function and then suddenly it's not internal anymore. Scandalous!

Copy link
Member Author

@infinisil infinisil Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add you @roberth and @fricklerhandwerk as code owners of lib/path.{nix,md}? I'm also already a code owner of lib. This way we'll get notifications for future updates to this library

lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated Show resolved Hide resolved
lib/path.nix Outdated
Comment on lines 32 to 40
if ! isString value then
throw "${errorPrefix}:\n Not a string"
else if value == "" then
throw "${errorPrefix}:\n The string is empty"
else if substring 0 1 value == "/" then
throw "${errorPrefix}:\n The string is an absolute path because it starts with `/`"
else if match "(.*/)?\\.\\.(/.*)?" value != null then
# We don't support ".." components, see ./path.md
throw "${errorPrefix}:\n The string contains a `..` component, which is not allowed in subpaths"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ! isString value then
throw "${errorPrefix}:\n Not a string"
else if value == "" then
throw "${errorPrefix}:\n The string is empty"
else if substring 0 1 value == "/" then
throw "${errorPrefix}:\n The string is an absolute path because it starts with `/`"
else if match "(.*/)?\\.\\.(/.*)?" value != null then
# We don't support ".." components, see ./path.md
throw "${errorPrefix}:\n The string contains a `..` component, which is not allowed in subpaths"
if ! isString value then throw "${errorPrefix}:
Not a string"
else if value == "" then throw "${errorPrefix}:
The string is empty"
else if substring 0 1 value == "/" then throw "${errorPrefix}:
The string is an absolute path because it starts with `/`"
else if match "(.*/)?\\.\\.(/.*)?" value != null then throw "${errorPrefix}:
The string contains a `..` component, which is not allowed in subpaths"

If it's an ugly hack already, why not make it a creepy hack that leverages the code's own indentation! 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha damn, I'd rather not because it's really not clear how much it indents like this, but this is interesting.

We could do this though:

    if ! isString value then throw ''
      ${errorPrefix}:
          Not a string''
    else if value == "" then throw ''
      ${errorPrefix}:
          The string is empty''
    else if substring 0 1 value == "/" then throw ''
      ${errorPrefix}:
          The string is an absolute path because it starts with `/`''
    # We don't support ".." components, see ./path.md
    else if match "(.*/)?\\.\\.(/.*)?" value != null then throw ''
      ${errorPrefix}:
          The string contains a `..` component, which is not allowed in subpaths''

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least this still expresses intent, rather than a potential misunderstanding that kinda worked out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add the value value to the error message when appropriate. Should make troubleshooting a lot quicker in many cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is included in the prefix:

nix-repl> path.subpath.normalise "/foo/bar"
error: lib.path.subpath.normalise: Argument "/foo/bar" is not a valid subpath string:
           The string is an absolute path because it starts with `/`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I would like to a see a streaming toPretty version that can also handle failures and early exit at some point though, see #98761 for a previous attempt of mine)

You can't catch aborts, and currently not even C++ really catches a stack overflow... and then there's non-termination of course.

Not entirely unsolvable for the purpose of error messages, in theory, but not something today's Nix will let us solve.

Perhaps a nice trick for attrsets specifically is to make use of the fact that at least the keys exist. If the attrNames are few enough, we could print them; otherwise show the "first" and "last" attribute name and perhaps make use of _type if there is one, if you're in a daring mood.

Sounds like we could have a safePrint function for such heuristics. It's a general problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@fricklerhandwerk fricklerhandwerk Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I try to do common-sense things in the Nix language it feels like writing a standard library from scratch. Even when writing a standard library from scratch.

I also think it should be good enough to account for the common errors at this point. The Nixpkgs Architecture Team can still open the topic of the standard library once the package reorg is done, and address the issue wholesale, with a targeted effort. Otherwise other targeted efforts like this one will constantly stall on side issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I try to do common-sense things in the Nix language it feels like writing a standard library from scratch. Even when writing a standard library from scratch.

That's an indication that such code shouldn't be written in Nix at all. It's not a general purpose language and attempts to use it as such will always be painful.

Copy link
Contributor

@fricklerhandwerk fricklerhandwerk Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact it's not all that painful, it just feels a bit like pioneering because the tools are not readily available.

The reason why "just don't use the Nix language for that purpose" doesn't always work is that the alternative is to use intermediate built steps. And you want to avoid those at almost all costs, because they are orders of magnitude slower than some clumsy string wrangling.

And no, I don't think building the tools into the language wholesale is the right answer to that issue. Because that's where you surely won't ever get rid of unfortunate design decisions.

@infinisil infinisil changed the title lib.path: init with relativeNormalise lib.path: init with subpath.normalise Dec 13, 2022
@edolstra
Copy link
Member

edolstra commented Dec 13, 2022

I'll repeat my earlier objection: Nix is not a suitable language for complex string manipulation, since it's way too slow for that. It's a DSL for describing compositions of packages/configurations. If we need operations for e.g. normalising paths, they should be added to the language (for instance, the CanonPath type introduced by the lazy-trees branch ensures a canonical representation of paths inside the evaluator).

Also, I would suggest doing this as a flake first. If it turns out to be sufficiently useful, it can always be added to the standard library. But once it's in the standard library, we're stuck with it forever.

lib/path.md Outdated Show resolved Hide resolved
@fricklerhandwerk
Copy link
Contributor

Nix is not a suitable language for complex string manipulation, since it's way too slow for that.

This is about first making it work. As @infinisil already said, if it turns out to be too slow, we can still make it fast by including it in the Nix language.

Also, I would suggest doing this as a flake first.

There is enough interest and capacity to add this in the standard library maintain it for a reasonable amount of time. It's much less useful and practically impossible to discover outside of Nixpkgs.

Also, the goal of this PR is to clear the ground for source combinators which also has multiple interested parties waiting for it to land.

But once it's in the standard library, we're stuck with it forever.

That's not completely true. We can have breaking changes with sufficient grace periods. Nixpkgs is easy to pin for consumers (as opposed to Nix), and can be refactored internally. The Nixpkgs Architecture Team discussed strategies for handling compatibility multiple times now.

Also, of all things, this design should rather be in the standard library as opposed to some other quite random-sounding functions which are already in there.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive test. I might have done something a bit more humble like in the source combinator tests, but this is better.

lib/tests/path.sh Outdated Show resolved Hide resolved
lib/tests/path.sh Outdated Show resolved Hide resolved
lib/tests/path.sh Outdated Show resolved Hide resolved
@infinisil
Copy link
Member Author

Rebased on master after merge conflicts. Also reorganized the commit history.

@infinisil infinisil changed the title lib.path: init with subpath.normalise lib.path: init with subpath.{isValid,normalise} Dec 23, 2022
@infinisil infinisil changed the title lib.path: init with subpath.{isValid,normalise} lib.path.subpath.{isValid,normalise}: init Dec 23, 2022
infinisil and others added 4 commits January 3, 2023 13:19
Adds initial work towards a `lib.path` library

Originally proposed in #200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io>
Co-Authored-By: Robert Hensing <robert@roberthensing.nl>
The first path library function
@roberth roberth merged commit f61d4d3 into master Jan 3, 2023
@edolstra edolstra deleted the lib.path.relativeNormalise branch January 3, 2023 12:53
@edolstra
Copy link
Member

edolstra commented Jan 3, 2023

I think merging this is a mistake, and I don't think my objections have been addressed.

  • It's not obvious that we need this in the standard library.
  • Performance: Nix is not a suitable language for this kind of library - string manipulation is slow and inefficient. It would be very bad for evaluation speed / memory usage if subpath.normalise is used widely.
  • It imposes path semantics that may be different from Nix's. It's not helpful for Nix users if Nix and Nixpkgs have different notions of how paths behave. If there are problems with Nix's path type, it should be addressed in Nix.
  • It makes learning the language harder, since it introduces concepts like "subpaths" that don't appear in the Nix reference manual.
  • It should have been done as a flake initially, since by putting it in the standard library we're stuck with it for a long time.

@infinisil
Copy link
Member Author

@edolstra Most of these have been addressed before:

  • It's not obvious that we need this in the standard library.

Agreed, it's not obvious, but it is needed for things like source combinators which needs to handle subpaths. We plan to add lib.path.append in the next iteration, which is just like <path> + ("/" + <string>) but safer, this depends on lib.path.subpath.isValid. In another iteration we plan to have something like lib.path.relativeTo which removes an absolute prefix from a path, returning the relative subpath that's left, this depends on lib.path.subpath.normalise. Getting the semantics of the base functionality correct is very important.

  • Performance: Nix is not a suitable language for this kind of library - string manipulation is slow and inefficient. It would be very bad for evaluation speed / memory usage if subpath.normalise is used widely.

Was already addressed twice, in #200718 (comment) and #205190 (comment)

  • It imposes path semantics that may be different from Nix's. It's not helpful for Nix users if Nix and Nixpkgs have different notions of how paths behave. If there are problems with Nix's path type, it should be addressed in Nix.

This was partly addressed here, but is also addressed in the merged design document.

  • It makes learning the language harder, since it introduces concepts like "subpaths" that don't appear in the Nix reference manual.

Subpaths aren't part of the language, it's just a nixpkgs concept.

  • It should have been done as a flake initially, since by putting it in the standard library we're stuck with it for a long time.

Already addressed here and here

@vcunat
Copy link
Member

vcunat commented Jan 10, 2023

The tests fail with (at least) one particular random seed:

Using seed 13110, use `lib/path/tests/prop.sh 13110` to reproduce this result
lib/path/tests/prop.sh: line 82: ['']='': bad array subscript

(I could reproduce that on current master and on this merge commit introducing the test.)

@infinisil
Copy link
Member Author

@vcunat Oh wow, thanks for that, fixed in #210042!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants