Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generate cache for stackage projects #358

Closed

Conversation

balsoft
Copy link
Contributor

@balsoft balsoft commented Dec 10, 2019

I moved @yorickvP work here. This fixes #335 and is conflicting with #351 . The cache argument is deprecated. The changes are reflected in the changelog. Docs probably need updating too.

@balsoft balsoft changed the title Automatically generate cache for stackage projects [WIP] Automatically generate cache for stackage projects Dec 10, 2019
@balsoft
Copy link
Contributor Author

balsoft commented Dec 10, 2019

This requires nix with this patch . I'll try to make it work for most use cases without it.

@hamishmack
Copy link
Collaborator

Now that #348 has been merged I don't think we need to go to the trouble of calling builtins.fetchGit and calculating a sha256 value. I think now we can just have it output:

name = s2n.cabalPackageName "${pkgsrc}/${subdir}";
rev = dep.commit;
ref = "*";
url = dep.git;
is-private = true;

The changes made in #348 and setting is-private should cause it to call builtins.fetchGit directly.

@yorickvP
Copy link
Contributor

yorickvP commented Dec 10, 2019 via email

@yorickvP
Copy link
Contributor

We should also implement git dependencies for custom snapshots, the tool I got this code from has support for that. It's licensed under the nixpkgs license, meant for easy upstreamability, so maybe the cabalName code should also move here.

@hamishmack
Copy link
Collaborator

If the .yaml to .json converter can preserve comments perhaps we can add support for # ref: and # nix-sha256: comments in the .yaml files. We do something similar with comments in cabal.project files to avoid the need for a separate cache file.

Keeping the commit hash and the nix hash next to each other in one place in git means merging changes is more reliable (less likely to update one of the hashes and not the other).

@yorickvP
Copy link
Contributor

yorickvP commented Dec 10, 2019

I am not sure if ref necessarily has to exist, or be constant. Stack itself also fetches using *.

It would be nice to directly fetch the revision, but GitHub does not allow shallow clones of revisions that are not a head, since the reachability checking algorithm is nonlinear and could be used to DoS them, and people sometimes want to unpublish commits by making them nonreachable. GitHub also does not support archive downloads for private repos with just the ssh key.

The yaml-to-json parser does not preserve comments and was chosen to have minimal dependencies (only ruby). Finding one that does would indeed be an improvement, in particular to be able to pre-specify the package name.

builtins.fetchGit does not support nix hashes (yet), but it would be useful to add them to nonprivate repos that can be fetched with pkgs.fetchgit.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 10, 2019

@hamishmack while we can omit sha256, we can't omit the builtins.fetchGit because we need to get a cabal name.

@balsoft balsoft changed the title [WIP] Automatically generate cache for stackage projects Automatically generate cache for stackage projects Dec 10, 2019
@angerman
Copy link
Collaborator

Can someone confirm my understanding of this?

  • We can not download dependencies that do not have a cache entry, as we need fixed output derivations for those, and hence need the cache information.
  • We can not run nix-prefetch-git inside of nix
  • without the hash (which we don't know without the cache) we can't get the dependencies.
  • if we combine builtins.fetchGit and iterate over the packages in the stack file we can download all dependencies, and automatically generate the cache information?

I see how this would produce a fully automated workflow. I'm a bit concerned about the extra overhead this would have though. Hence the following reservations:

  • I'm not sure we really want to deprecate the cache argument. I'd rather have a trace message that says it's going to compute the cache information on the fly, and ideally dump the cache information via another trace such that one could speed this up by providing the cache for subsequent calles. Similar to how we have the materialization logic.
  • Adding an additional dependency always comes at a cost. Does adding s2n offer enough over inlining it?
  • Should the cache generation function be exposed as something one could easily hook into a git hook, such that teams that do not want to rely on the CI calculating the hashes and doing lots of network requests have an alternative method?

@cdepillabout
Copy link
Contributor

cdepillabout commented Dec 18, 2019

@angerman I implemented #348, so let me try to answer some of your questions.

We can not download dependencies that do not have a cache entry, as we need fixed output derivations for those, and hence need the cache information.

This is sort of correct.

We cannot download dependencies that do not have a cache entry (as we need fixed output derivations) if we use pkgs.fetchgit.

We can download dependencies in some instances without a cache entry by using builtins.fetchGit (as implemented in #348). builtins.fetchGit runs at eval time, and it doesn't require a sha256.

(However, there were some reports in
#309 that builtins.fetchGit doesn't work on Hydra, which is why we still have support for both builtins.fetchGit and pkgs.fetchgit.)

There are two instances that we can't download dependencies automatically without a cache entry:

  1. We need a ref that is not HEAD (or master or whatever). Normally the ref argument is a branch name.

    builtins.fetchGit requires a ref to be specified whenever you need to pull a commit from a branch that is not reachable from HEAD. As far as I know, there is no way to make builtins.fetchGit clone the full repository and all branches.

    It appears that some of the users above have tried to get this functionality into nix, but Eleco is against it in the threads linked.

    It also appears that this PR removes the ability added in Update mkCacheFile and mkCacheModule to support private repos with builtins.fetchGit #348 to pull from other branches other than HEAD without using a patched version of nix.

    We use this feature at work, and it was the reason I sent Update mkCacheFile and mkCacheModule to support private repos with builtins.fetchGit #348, so I would be sad if this PR was merged and we had to figure out some way to work around it again.

  2. Currently, cache entries require the name of the underlying Haskell package. stack.yaml files don't actually list the name of the underlying Haskell package.

    I believe this name can be figured out after you download the git repo, but it will require a little extra work to do. (It looks like this PR has the logic required to do this.)

We can not run nix-prefetch-git inside of nix without the hash (which we don't know without the cache) we can't get the dependencies.

I believe this is true, but I am not super familiar with nix-prefetch-git.

However, I don't think nix-prefetch-git is a solution for private git repos, since private git repos can only be fetched with builtins.fetchGit.

if we combine builtins.fetchGit and iterate over the packages in the stack file we can download all dependencies, and automatically generate the cache information?

See my answer above. This won't work for private git repos where the commit in question is on a branch not reachable from HEAD.

However, we should be able to generate most of the required information directly from the stack.yaml file.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 18, 2019

I'm not sure we really want to deprecate the cache argument. I'd rather have a trace message that says it's going to compute the cache information on the fly, and ideally dump the cache information via another trace such that one could speed this up by providing the cache for subsequent calles. Similar to how we have the materialization logic.

Yeah, now that I think about it I don't know why I decided to deprecate the cache.

Adding an additional dependency always comes at a cost. Does adding s2n offer enough over inlining it?

I'm not sure, I'm not a maintainer of haskell.nix :) If you believe it should be inlined -- let's do that, but it'll add quite a bit of code to maintain and update.

Should the cache generation function be exposed as something one could easily hook into a git hook, such that teams that do not want to rely on the CI calculating the hashes and doing lots of network requests have an alternative method?

It is exposed as genStackCache in overlay, so it should be pretty trivial to wrap around that.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 18, 2019

@cdepillabout

It also appears that this PR removes the ability added in #348 to pull from other branches other than HEAD without using a patched version of nix.

Can you check if it does really break that? It works on non-HEAD in my experience (I have packaged multiple packages with this PR added, and I believe in one case it's a non-head commit in one of the deps).

@yorickvP
Copy link
Contributor

yorickvP commented Dec 19, 2019 via email

@yorickvP
Copy link
Contributor

Inlining our stack2nix library was always the goal. The license is the same as the nixpkgs one.
It's true that the fetchGit adds some overhead, but it's not nearly as much as the rest of haskell.nix, especially since people tend to have 1-5 git dependencies at most, and git caches it rather decently. It also strikes me as inevitable to do this at eval time, because of private git repos.

The overhead could be reduced a bit if stack-to-nix could figure out the package names itself, somehow, and by using the fetchGit'd packages directly, instead of using the nix-hash trick. That way, there's only a single IFD.

@cdepillabout
Copy link
Contributor

@balsoft @yorickvP
Thanks for the follow-up here.

I tested this current PR with our nix stuff at work, and it doesn't work (but it is not because of the ref thing I was fearing).

I've tested using builtins.fetchGit with a ref of *, and it does appear to work for our stuff at work. So I am no longer worried about this. (However, I'm pretty sure I did test this yesterday, so I'm not sure why it wasn't working yesterday.)

However, the genStackCache function introduced in this PR doesn't output caches in the correct format for the mkCacheLine function introduced in #348.

The mkCacheLine function introduced in #348 uses the is-private attribute to decide whether to use builtins.fetchGit vs pkgs.fetchgit:

entireRepo =
if is-private
then
# It doesn't make sense to specify sha256 on a private repo
# because it is not used by buitins.fetchGit.
assert isNull sha256;
builtins.fetchGit
({ inherit url rev; } //
self.buildPackages.lib.optionalAttrs (ref != null) { inherit ref; }
)
else
# Non-private repos must have sha256 set.
assert sha256 != null;
# pkgs.fetchgit doesn't have any way of fetching from a given
# ref.
assert isNull ref;
self.buildPackages.pkgs.fetchgit {
url = url;
rev = rev;
sha256 = sha256;
};

However, it appears that this PR never sets the is-private attribute on the cache entries, which is causing them to always be fetched with pkgs.fetchgit. This doesn't work for private repos.


If the above problem is fixed, I'd be very happy to see this merged into haskell.nix.

Also, if possible, it would be nice to have tests for private repos so we can be sure this feature doesn't break at some point in the future.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 19, 2019

@cdepillabout Do the above commits solve your problem?

@cdepillabout
Copy link
Contributor

@balsoft Looking at the two commits above, it does appear that they should solve my problem, but when trying to actually use them, I'm still running into a problem where it appears that pkgs.fetchgit is being used.

I don't have any more time to look into this today, but I'll try to figure out what is going wrong on Monday.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 20, 2019

@cdepillabout they use fetchgit when repo address starts with https, and fetchGit otherwise.

@cdepillabout
Copy link
Contributor

@balsoft I had some time to look into this today.

It seems like everything now works, except that there is a small logic error in lib/stack-cache-generator.nix. Here's a diff that will fix it:

diff --git a/lib/stack-cache-generator.nix b/lib/stack-cache-generator.nix
index eb9dd80..aff325c 100644
--- a/lib/stack-cache-generator.nix
+++ b/lib/stack-cache-generator.nix
@@ -34,6 +34,6 @@ concatMap (dep:
                 rev = dep.commit;
                 url = dep.git;
                 is-private = private url;
-                sha256 = if is-private then hashPath pkgsrc else null;
+                sha256 = if !is-private then hashPath pkgsrc else null;
             } // (optionalAttrs (subdir != "") { inherit subdir; }))
         (dep.subdirs or [ "" ])) deps

However, this is only needed because of the following code in mkCacheLine I added in #348:

https://github.com/cdepillabout/haskell.nix/blob/f29409a21e014cc7c90f9ecfaeb6118252ef9793/overlays/haskell.nix#L239

It wouldn't necessarily hurt anything to remove that assertion (or even better, turn it into a warning log message that the sha256is being ignored.).

@balsoft
Copy link
Contributor Author

balsoft commented Dec 23, 2019

@cdepillabout Fixed the logic error. I think that changing behaviour of actual fetcher belongs in a different PR.

@cdepillabout
Copy link
Contributor

@balsoft I think this should be ready to be merged in?

Although it might be nice to have one final review from @angerman and @hamishmack.

@yorickvP
Copy link
Contributor

I would maybe inline the serokell github dependency.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 23, 2019

@yorickvP How do you think this should be inlined? Just copy-paste, subtree or submodule? I'm thinking subtree ATM.

@balsoft
Copy link
Contributor Author

balsoft commented Dec 25, 2019

cc @angerman @hamishmack

@angerman
Copy link
Collaborator

angerman commented Jan 6, 2020

We are looking into this, and Ihope we'll have this finally merged by end of this week. Sorry for the delay. Conceptually I'm not opposed, I'd just like to figure out if we can potentially make nix-tools more intelligent to reduce the size of this in haskell.nix.

@angerman
Copy link
Collaborator

Was this resolved with #397 to everyones content? If so, I'd like to close this one. If not, please let me know!

@balsoft
Copy link
Contributor Author

balsoft commented Jan 16, 2020

@angerman #397 works for us.

@angerman angerman closed this Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't download uncached dependencies
5 participants