Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-store --delete doesn't work on binary caches #1968

Open
LisannaAtHome opened this issue Mar 12, 2018 · 14 comments
Open

nix-store --delete doesn't work on binary caches #1968

LisannaAtHome opened this issue Mar 12, 2018 · 14 comments
Assignees
Labels
cli The old and/or new command line interface feature Feature request or proposal

Comments

@LisannaAtHome
Copy link

Currently there is no real way to manage the size of a binary cache. The functionality required to enable this use case is NIX_REMOTE=file:///path/to/binary/cache nix-store --delete <path>, which currently returns: error: requested operation is not supported by store.

Not all of us who host nix channels and binary caches have infinite storage.

To implement this effectively, support for the operation nix-store --query --referrers will also have to be implemented for binary caches, since nix-store --delete needs to fail if you attempt to delete something that is referred to by another path in the store.

It seems like this should be really easy to implement: to implement --query --referrers, search for the path reference in all of the .narinfo files, and return the ones that contain it. To implement --delete, first check that there are no referrers, and then delete the .narinfo and the .nar if there are none.

I would implement this but I suck at writing C++.

@edolstra
Copy link
Member

There are some scripts for doing binary cache GC in https://github.com/NixOS/nixos-channel-scripts. However, proper GC is far from trivial:

  • Detecting referrers by reading all .narinfo files is very expensive, since this may require doing millions of read requests.

  • There are no GC roots. The find-binary-cache-garbage.pl script used the contents of MANIFEST files, but we got rid of those. They could use the store-paths.xz file of each release, though.

  • There are no temporary GC roots. So there is nothing preventing the garbage collector from deleting parts of closures while they are being uploaded by another process. find-binary-cache-garbage.pl dealt with this by considering all files younger than 180 days as live.

@dtzWill
Copy link
Member

dtzWill commented Mar 13, 2018

I'll add that listing all paths (needed to check referrers) is not only crazy expensive but forbidden security-wise. Presently one can store private paths in a cache and there is no way to "discover" them without knowing the corresponding hash[1].

This isn't advertised but is rather useful IMHO. That said I'm also overdue for putting together scripts to delete old paths (hopefully largely based on the ones linked above, and those in Hydra IIRC)

[1] Unless you run cache.nixos.org and log queries, haha. Doesn't bother me (I trust Nix people and infra) but worth keeping in mind when managing a private cache while still using public cache.

@shlevy shlevy added the backlog label Apr 1, 2018
@shlevy shlevy self-assigned this Apr 1, 2018
@dhruvio
Copy link

dhruvio commented Jul 21, 2020

Wondering if there was any movement on this lately? @shlevy

@Ericson2314
Copy link
Member

Ericson2314 commented Jul 21, 2020

LocalBinaryCacheStore could at least support this, and could help for the usual case if someone has a directory they mirror with HTTP they can operator on directly.

@edolstra
Copy link
Member

LocalBinaryCacheStore could at least support this

Not efficiently, because we don't have a way to see whether a path has any referrers.

@Ericson2314
Copy link
Member

Ericson2314 commented Jul 21, 2020

True. Nix could build a temporary sqlite database, but now this isn't such a trivial code change.

Maybe the pinning in IPFS will be the real solution here :D.

@dhruvio
Copy link

dhruvio commented Jul 22, 2020

Is it possible to create a generation manually? I am using nix copy and SSH to communicate with my binary cache. It would be nice to be able to copy a bunch of paths over to the cache server, then create a generation that includes only the paths that were copied. Then, we could use the standard garbage collection tools to only delete generations older than X days (for example). Would this solution be feasible?

@domenkozar
Copy link
Member

I'd note that this was one of the main reasons I've built https://cachix.org/ as it trades between having all history (expensive and never really needed) and real world situations where you can say I want to keep last 6 months of actively accessed entries and adjust the storage accordingly by GC-ing least recently used entries.

Besides being careful that you don't delete narinfo in closures and result into nix build looping, you also need access information per nar to be able to collect least recently used entries.

@dhruvio
Copy link

dhruvio commented Jul 31, 2020

Cachix does have nice functionality in this regard. I'm wondering if there is an achievable way to integrate similar functionality into the existing open source Nix ecosystem. I think my suggestion to manually create a generation at will is one way to do this. I am keen to hear feedback on whether this is feasible, and of any other approaches that could be taken.

@stale
Copy link

stale bot commented Feb 12, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 12, 2021
@nrdxp
Copy link
Contributor

nrdxp commented Jul 21, 2022

For the record, I think all that's really needed are a flag to delete based on age (everything older than 6 months), and a flag to remove based on popularity (an LRU system, probably the more difficult of the two to implement), and you should be able to combine the two. So say, delete everything older than a month unless it has more than 150 pulls, etc etc.

I was pretty astonished to find recently that there doesn't seem to be any good way to achieve an LRU policy on an AWS s3 bucket directly, so I basically just have to go manually from time to time and clean up old artifacts.

@stale stale bot removed the stale label Jul 21, 2022
@fricklerhandwerk fricklerhandwerk added feature Feature request or proposal cli The old and/or new command line interface labels Oct 6, 2022
@arianvp
Copy link
Member

arianvp commented Dec 14, 2023

I had a shower thought today:

Why not store what we're storing in the Sqlite database in DynamoDB or even a proper remote SQL database?

Then we can efficiently query metadata for garbage collection without having to read millions of narinfo files

@Ericson2314
Copy link
Member

Ericson2314 commented Dec 14, 2023

Sure, that is definitely an option. Anyone can ingest all the nar once, build their own DB, do some calculations, and then modify the binary store accordingly, in fact.

@edolstra
Copy link
Member

DynamoDB is not that great for this because it doesn't provide an easy way to maintain the referrers relation (which is what we need for GC) in a transactional way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli The old and/or new command line interface feature Feature request or proposal
Projects
None yet
Development

No branches or pull requests

10 participants