Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source combinators: extend, trace, cutAt, focusAt and more #112083

Closed
wants to merge 21 commits into from

Conversation

roberth
Copy link
Member

@roberth roberth commented Feb 5, 2021

Motivation for this change

Improve usability of lib.sources, specifically

  • Extend the capabilities of lib.sources, primarily:
    • extend to construct sources from pre-existing sources in related directories
    • allow a source to point to a subpath inside the generated store path
  • Add documentation.
  • Add tests.

Refactor cleanSourceWith to operate on a simple internal representation, in preparation for new combinators to be added later.

I've discussed a union-like combinator here before.

extend combines naturally with the subDir feature in the haskell.nix library. This became subpath. includeSiblings was replaced by cutAt and/or extend.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • nix-build lib/tests/release.nix, also run by ofborg Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@infinisil
Copy link
Member

Sounds exciting! I guess you already have an idea for how to design combinators? I'm sure @Profpatsch is going to be interested too. Also pinging @nh2 as the author of #56985

@roberth
Copy link
Member Author

roberth commented Feb 6, 2021

@infinisil thanks. I've added my ideas to the description for context, but this PR can be reviewed independently from those.

@roberth roberth changed the title cleanSourceWith refactor and basic tests cleanSourceWith refactor and testing Feb 16, 2021
@roberth roberth changed the title cleanSourceWith refactor and testing source combinators: union, trace, reparent and more Feb 21, 2021
@roberth roberth marked this pull request as draft February 21, 2021 11:53
@roberth
Copy link
Member Author

roberth commented Feb 21, 2021

irrelevant comment

unionSource got a bit more complicated than I expected because of the "subpath invariant" as described in the inline comments.

@roberth
Copy link
Member Author

roberth commented Feb 21, 2021

considerations about the short names

I have an idea that may improve the feel of this interface. The new functions have a new style of interface that is a bit distinct from cleanSourceWith.

cleanSourceWith:

  • has Source in the name
  • params are an attribute set, doing many things at once. Too many?
  • users can't find the rename function because it's a name parameter in cleanSourceWith
  • looks unappealing when mixed with the other combinators like unionSource or reparentSource
cleanSourceWith { name = "myproject-src"; src = unionSource ./pom.xml (cleanSource ./src); }

sources.filter (idea):

  • uses source attrset for namespacing
  • params are minimalistic: just the filter function and original source. If you need something else, use a different function
  • matches the "style" of sources.union
  • can't pluralize for lists like unionSource -> unionSources. sources.unionAll?
  • can be used with inherit or with
with sources;
rename "myproject-src" (union ./pom.xml (clean ./src))
with souces;
rename "myproject-src" (
  union
    ./pom.xml
    (filter (path: _type: /* ... */) ./src)
)

@roberth roberth force-pushed the source-combinators branch 2 times, most recently from dc77fe1 to e66249d Compare March 2, 2021 14:06
@ofborg ofborg bot added the 6.topic: stdenv Standard environment label Mar 2, 2021
@roberth roberth changed the title source combinators: union, trace, reparent and more source combinators: extend, trace, cutAt, pointAt and more Mar 2, 2021
@roberth
Copy link
Member Author

roberth commented Mar 2, 2021

I went ahead and implemented the subpath feature. You can now write a derivation that uses files from parent directories. For example, you may want to write a Nix expression in a project's doc directory to generate html.

  • makelib/lib.make with makefile rules
  • README.md; another file we want to use
  • doc/Makefile that has include ../makelib/lib.make

It's a bit of a contrived example, but situations like these happen all the time with various auxiliary files aren't directly in the subpath where you need them. With this PR, we don't have to touch the existing makefiles and we don't have to base our source on the parent directory and filter away almost everything. doc/default.nix can look like:

{ stdenv, pandoc, lib }:
stdenv.mkDerivation {
  name = "README.html";
  nativeBuildInputs = [ pandoc ];
  src = with lib.sources; extend (cleanSource ./.) [(cleanSource ../makelib) ../README.md];
}

@roberth roberth marked this pull request as ready for review March 18, 2021 22:30
@roberth roberth requested a review from nh2 March 18, 2021 22:30
@roberth
Copy link
Member Author

roberth commented Mar 18, 2021

This is finally ready for review. Took a bunch of iterations to get it to this point, some of which I haven't rebased away, so I recommend to review the end result instead of the commits.
I've purposefully left one WIP commit, which is the one about the stdenv addition. I'd like to redo that one in a simpler way that doesn't mess with hooks, but doing so requires a mass-rebuild, which is annoying when you want to try it out. I'll make sure to remove it before merging and then create a follow-up PR for the stdenv addition.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-developer-role-antithesis-startup-on-site-in-dc-area/18234/1

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 21, 2022
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-generate-documentation-from-arbitrary-nix-code/22292/8

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 11, 2022
Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a closer look at the PR, the biggest problem is how hard it is to understand what the functions extend, empty, focusAt, cutAt, getOriginalFocusPath, getSubpath and reparent do. It's neither obvious from the name or their documentation. I'm also not understanding use cases for these, when would I want to use one or the other? I'm giving this a better try here:

An overview of the functions

The core concept of this PR is that it works with two anchor points of filtered sources:

  • root anchor: The root of the store path import, only files under this anchor can be included in the filter
  • subpath anchor: The subpath of the root anchor where the filtered source is pointing to. This is what the filtered source evaluates to with "${source}" or source.outPath

I'm going to use a new notation for a filtered source, where // indicates the root anchor:

/my/project//some/subpath/
            ^            ^
    root anchor      subpath anchor

# When imported into the store:
/nix/store/<hash>-<name>/some/subpath/

Using this format, we can explain better what individual functions do to these two anchors and the filtering. Note that these examples are only illustrative, this is not an exact description.

  • focusAt: Sets the subpath anchor without changing the filtering:

    focusAt ./some /my/project//some/subpath
    -> /my/project//some
    
    focusAt ./some/subpath/foo /my/project//some/subpath
    -> /my/project//some/subpath/foo
  • cutAt: Moves the root anchor down, preventing imports of files outside of the new root:

    cutAt ./some /my/project//some/subpath
    -> /my/project/some//subpath
  • reparent: Moves the root anchor up, without adding any new files:

    reparent /my /my/project//some/subpath
    -> /my//project/some/subpath
  • extend: Adds new files, moves the root anchor up if necessary:

    extend /my/project//some/subpath [ /my/otherproject/some/file ]
    -> /my//project/some/subpath
  • empty: Returns a filtered source with a specific root anchor, but with all files filtered out:

    empty /my/project
    -> /my/project//
  • getOriginalFocusPath: Returns the subpath anchor:

    getOriginalFocusPath /my/project//some/subpath
    -> /my/project/some/subpath
  • getSubpath: Returns the subpath anchor relative to the root anchor. This is what gets appended to the store path result of the underlying builtins.filterSource call:

    getSubpath /my/project//some/subpath
    -> some/subpath
  • .origSrc (not really exposed): Returns the root anchor. This is the src argument to the underlying builtins.filterSource call:

    (/my/project//some/subpath).origSrc
    -> /my/project

An alternative model

While I think above gives a nice overview of these functions and potentially makes it easier to give them better names, I'd also like to propose a hopefully simpler to understand alternative set of functions. Notably omitting:

  • A way to move the root anchor up (reparent, part of extend). Users should declare the root anchor manually. This is what everybody is used to already and should only be necessary once per project repository.

  • An identity source (empty), because it's not necessary if the API is designed to take lists of sources.

  • union :: Path -> ListOf SourceLike -> SourceLike: Filters a path to only include the directories/files from the given list. It throws an error if any items have a subpath set. This is a simpler extend.

  • setSubpath :: Path -> SourceLike -> SourceLike: Sets the subpath, like doing appending or removing /path/entries to the source path, essentially path manipulation. We just can't use dirOf and + because those don't propagate the filtering function. This is the same as the previous focusAt. Note that a getSubpath is needed too.

  • limitToSubpath :: SourceLike -> SourceLike: Limits the files that get imported into the store to the the ones under the subpath set with setSubpath, the result has an empty subpath. This is like reimporting the path the source is pointing at. This is a simpler cutAt.

  • Functions to get the `

Notably I think these are easier to separate: union can be introduced in one PR, setSubpath along with the mkDerivation change in another, and limitToSubpath in another one (though it depends on setSubpath)

if pre == "/" # root is the only path that already ends in "/"
then true
else
hasPrefix (pre + "/") (other + "/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit obscure why + "/" is needed. From what I can tell, it's so that /some/path doesn't match as a prefix of /some/pathpath. While hasPrefix (pre + "/") other would fix that, this would then break for when pre == other. So adding the "/" to other as well makes it work when the same path is a prefix of itself.

Comment on lines +147 to +152
# Function to memoize
f:
# What to return when a path does not exist, as a function of the path
missing:
# Filesystem location below which the returned function is defined. `/.` may be acceptable, but a path closer to the data of interest is better.
root:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a rather special function, 3 arguments and no clear order to them, I'd say this benefits from making it an attribute set argument

Suggested change
# Function to memoize
f:
# What to return when a path does not exist, as a function of the path
missing:
# Filesystem location below which the returned function is defined. `/.` may be acceptable, but a path closer to the data of interest is better.
root:
{
# Function to memoize
f,
# What to return when a path does not exist, as a function of the path
missing,
# Filesystem location below which the returned function is defined. `/.` may be acceptable, but a path closer to the data of interest is better.
root,
}

@roberth
Copy link
Member Author

roberth commented Oct 11, 2022

Thanks for the review.

The overview looks good.
It will be nice to cut it down a bit ✂️
cutAt 🙊

Anyway, these are my thoughts now. I'll make some of the changes later.

While I think above gives a nice overview of these functions and potentially makes it easier to give them better names, I'd also like to propose a hopefully simpler to understand alternative set of functions. Notably omitting:

  • A way to move the root anchor up (reparent, part of extend). Users should declare the root anchor manually. This is what everybody is used to already and should only be necessary once per project repository.
  • An identity source (empty), because it's not necessary if the API is designed to take lists of sources.

empty shouldn't be necessary, even with extend instead of union; same for reparent. I probably added them for completeness, but they're a distraction.

  • union :: Path -> ListOf SourceLike -> SourceLike: Filters a path to only include the directories/files from the given list. It throws an error if any items have a subpath set. This is a simpler extend.

Wait, is this still for omission? I think we're switching to addition now.

Note to self: I do still mention union in some doc.

Throwing is a usability and design smell. These were intended to be combinators, so functions to operate on any source input.

What is the Path argument for? Is it the source that provides the focus (subpath) like in my earlier iterations that had union?

  • setSubpath :: Path -> SourceLike -> SourceLike: Sets the subpath, like doing appending or removing /path/entries to the source path, essentially path manipulation. We just can't use dirOf and + because those don't propagate the filtering function. This is the same as the previous focusAt.

Correct. I'm ok with this name.

Note that a getSubpath is needed too.

Already added.

  • limitToSubpath :: SourceLike -> SourceLike: Limits the files that get imported into the store to the the ones under the subpath set with setSubpath, the result has an empty subpath. This is like reimporting the path the source is pointing at. This is a simpler cutAt.

👍

cutAt was more complicated than necessary. limitToSubpath is a simpler, equally capable replacement.

  • Functions to get the `

Looks like you accidentally your comment 😕

Notably I think these are easier to separate: union can be introduced in one PR, setSubpath along with the mkDerivation change in another, and limitToSubpath in another one (though it depends on setSubpath)

The subpath features don't rely on extend either, so doing those is still a good first step, regardless of "merge" function design.

@infinisil
Copy link
Member

  • union :: Path -> ListOf SourceLike -> SourceLike: Filters a path to only include the directories/files from the given list. It throws an error if any items have a subpath set. This is a simpler extend.

Wait, is this still for omission? I think we're switching to addition now.

I was thinking of removing extend and replacing it with union here, because ignoring the subpath of a, any extend a bs can be replaced with union commonPrefixPath ([ a ] ++ bs) (with the user manually figuring out commonPrefixPath).

Throwing is a usability and design smell. These were intended to be combinators, so functions to operate on any source input.

As far as I can see, extend currently just ignores any subpaths on its list argument, it only takes the subpath from the first argument right? My proposed union just changes that to a throw when the list argument has any subpaths, because ignoring them would leave people wondering.

If somebody needs a subpath of the result, they can do setSubpath ./some/path (union ./. [ ... ]).

What is the Path argument for? Is it the source that provides the focus (subpath) like in my earlier iterations that had union?

The path argument of union is the root of the store import. It's notably only a path, not a source filter, so no subpath or filtering, it only provides the root anchor.

  • Functions to get the `

Looks like you accidentally your comment confused

Ah whoops, was gonna say "Functions to get the subpath, origSrc, etc.", probably a good idea to have that

@roberth roberth mentioned this pull request Nov 11, 2022
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/advice-wanted-writing-a-build-system-in-nix/23460/4

@infinisil infinisil mentioned this pull request Jan 12, 2023
13 tasks
roberth added a commit to srid/haskell-flake that referenced this pull request Jan 18, 2023
This prevents unnecessary rebuilds of packages that haven't really
changed, except for a hash that included irrelevant sources.

By applying a trivial source filter, we create a new store path
that only contains the subdirectory that the build cares about.

This isn't 100% compatible with all projects, as some packages might
"legitimately" depend on files outside of their own directory. We
could support this by adding an option for sources to be unioned,
but a source union function isn't available, _yet_. For progress,
check NixOS/nixpkgs#112083

As a workaround we may allow the filtering to be disabled entirely.
This could be implemented as a `packages.<name>.filteredSource`
option with a default, but for now that seems over-engineered.
@infinisil infinisil mentioned this pull request Mar 25, 2023
7 tasks
@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
@infinisil
Copy link
Member

infinisil commented Jun 19, 2023

Recently I worked on an improved set of combinators inspired by this PR. It's currently a draft Nixpkgs PR, but it needs more eyes! So if you're interested in this, please take a look, try it out, and give feedback! The best overview is here: https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117

@infinisil
Copy link
Member

infinisil commented Nov 9, 2023

The lib.fileset library is now fairly usable, see #266356 for the status and updates! As such I don't think we need this PR here anymore, I'll close it. If you use the file set library but run into problems, please report this in linked issue (ideally with an issue/PR).

@infinisil infinisil closed this Nov 9, 2023
infinisil added a commit to tweag/nixpkgs that referenced this pull request Nov 9, 2023
Make root, subpath internal

Adapted from NixOS#112083

Co-Authored-By: Robert Hensing <robert@roberthensing.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: developer experience 6.topic: stdenv Standard environment 10.rebuild-darwin: 0 10.rebuild-linux: 0 significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants