Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC should skip over raw blocks #7160

Closed
Stebalien opened this issue Apr 14, 2020 · 7 comments
Closed

GC should skip over raw blocks #7160

Stebalien opened this issue Apr 14, 2020 · 7 comments
Labels
effort/hours Estimated to take one or several hours exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now

Comments

@Stebalien
Copy link
Member

Stebalien commented Apr 14, 2020

Currently, when GC traverses the pin set, it loads all blocks, even raw blocks. This means:

  1. We need to perform unnecessary work (fetching raw blocks from disk).
  2. When using the filestore, if the backing data disappears, the associated leaf blocks will return errors on read. At the moment, this will completely break GC.

We should simply avoid loading raw blocks when garbage collecting (or at least just check for existence instead of reading the block). This will:

  1. Improve GC performance significantly when working with large amounts of data with raw leaves.
  2. Make GC continue to work even if the filestore gets corrupted.
@Stebalien Stebalien added the kind/enhancement A net-new feature or improvement to an existing feature label Apr 14, 2020
@Stebalien Stebalien mentioned this issue Apr 14, 2020
3 tasks
@RubenKelevra
Copy link
Contributor

RubenKelevra commented Apr 23, 2020

How about adding three flags to the GC operation to determine how the GC should handle filestore and urlstore?

  1. --filestore[/urlstore]=ignore

    Just skip all filestore/URL store entries completely and don't check if they are accessible.

  2. --filestore[/urlstore]=keep

    Keep all entries which are still accessible, regardless if they are pinned or not.

    Warn about non-accessible files/urls which are part of a pin/file.

  3. --filestore[/urlstore]=drop - default

    Drop entries from IPFS if they are not pinned/in the MFS

@hsanjuan hsanjuan added the P3 Low: Not priority right now label Apr 28, 2020
@hsanjuan
Copy link
Contributor

Is this work meant to be scheduled soon or would we rather wait to doing a larger GC revamp (possibly after raw-multihashes and pinning system changes)? I have tentatively tagged as P3.

@Stebalien
Copy link
Member Author

P3 seems right given that it won't help with performance in many cases. My main interest is actually allowing GC to progress even when filestore blocks are missing.

@poonai
Copy link
Contributor

poonai commented May 2, 2020

The GC should avoid fetching raw blocks if the given CID's codec is raw. Is this right?

If it's right. I would like to raise a PR for this.

@Stebalien Stebalien added exp/intermediate Prior experience is likely helpful effort/hours Estimated to take one or several hours help wanted Seeking public contribution on this issue labels May 4, 2020
@Stebalien
Copy link
Member Author

Exactly. PRs welcome!

@Stebalien
Copy link
Member Author

This idiot (@Stebalien <-- this one) made bad assumptions and filed an issue without actually verifying the issue. Apparently, we don't load raw blocks when garbage collecting.

@Stebalien
Copy link
Member Author

@balajijinnah thank you for implementing this, and again, I apologize for not verifying the issue first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now
Projects
None yet
Development

No branches or pull requests

4 participants