Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAR import should not pin roots by default (ipfs dag import --pin-roots=false) #9765

Closed
lidel opened this issue Mar 29, 2023 · 5 comments · Fixed by #9926 or #9966
Closed

CAR import should not pin roots by default (ipfs dag import --pin-roots=false) #9765

lidel opened this issue Mar 29, 2023 · 5 comments · Fixed by #9926 or #9966
Labels
dif/easy kind/bug A bug in existing code (including security flaws) need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up topic/commands Topic commands topic/pinning Topic Pinning (local and remote)

Comments

@lidel
Copy link
Member

lidel commented Mar 29, 2023

Problem

We are working on Graph API for HTTP Gateways to enable ?format=car that returns a subset of a DAG and/or parent nodes of a sub-path.
As part of this work, we have to decide, if returned CAR stream includes root CIDs in CAR header, and anticipate what happens, when Kubo user tries to import such "bag of blocks" into local Kubo node using `ipfs dag import.

Ref. ipfs-inactive/bifrost-gateway#61 (comment) for more details.

Current state

The current default behavior of Kubo's ipfs dag import defaults to --pin-roots=true, which will RECURSIVELY pin every CID from the CAR header.

If any DAG blocks are not in the CAR, it will either fail due to pinning being in offline mode (this was the behavior last time I checked) OR if someone refactors it at some point, a network fetch may be triggered, overfetching often absurd amount of data (export single wiki article, overfetch 350GiB).

This is an unfortunate legacy behavior, but we can't ignore it. Kubo is used by the majority of IPFS ecosystem outside PL, and the majority of "partial/path CARs" fetched from Rhea/Saturn will be consumed by Kubo's dag import at some point.

Also, pinning by default is inconsistent with ipfs dag put (which does not pin by default).

Proposed change

Change dag import default to not pin (--pin-roots=false)

This will make dag import compatible with the vision of "CAR being just a bag of blocks",
and make graph operations such as pinning DAG root recursively an opt-in.

cc @rvagg @ribasushi @aschmahmann – does this sound sensible?

@lidel lidel added kind/bug A bug in existing code (including security flaws) topic/commands Topic commands P1 High: Likely tackled by core team if no one steps up need/maintainers-input Needs input from the current maintainer(s) topic/pinning Topic Pinning (local and remote) dif/easy labels Mar 29, 2023
@rvagg
Copy link
Member

rvagg commented Mar 29, 2023

👌 reporting lack of pinning to the user might be appropriate too since it'll be a downgrade in behavior for some. But otherwise I think this is a really good change and puts more control in the users' hands (well it should, maybe we'll see if users expect to not have to think about this stuff).

@BigLep
Copy link
Contributor

BigLep commented May 18, 2023

@lidel : I put this in 0.22 best effort track. Let me know if you disagree.

rvagg added a commit that referenced this issue Jun 9, 2023
rvagg added a commit that referenced this issue Jun 9, 2023
rvagg added a commit that referenced this issue Jun 9, 2023
rvagg added a commit that referenced this issue Jun 9, 2023
rvagg added a commit that referenced this issue Jun 9, 2023
rvagg added a commit that referenced this issue Jun 13, 2023
rvagg added a commit that referenced this issue Jun 13, 2023
@aschmahmann
Copy link
Contributor

aschmahmann commented Jun 13, 2023

I'm not particularly attached to the outcome here, but are we happy with the advantage of not pinning roots by default vs running in "offline" mode by default (as it is today)?

Pinning takes place in offline-mode exclusively, one root at a time.
If the combination of blocks from the imported CAR files and what is
currently present in the blockstore does not represent a complete DAG,
pinning of that individual root will fail.

OR if someone refactors it at some point, a network fetch may be triggered, overfetching often absurd amount of data (export single wiki article, overfetch 350GiB).

We could write tests to defend against this if we're concerned (they might even be there already).

AFAICT then the UX tradeoff for a CAR with a root for a DAG that is bigger than what's in the CAR is would you rather:

  1. The user has to import their data twice since the first time they didn't realize they needed to do --pin-roots=false
  2. The user doesn't realize that dag import doesn't protect the data from GC and the next time the user runs GC (or it happens automatically) the data disappears. Yes, I know some commands like block put and dag put work this way already so this isn't totally out of nowhere.

Option 1 sounds nicer to me since the user is more protected from accidentally bad decisions (particularly in light of defaults changing). However, that's just one person's opinion 😄.

@lidel
Copy link
Member Author

lidel commented Jun 13, 2023

I've been digging a bit more and it seems we actually force offline pinning during dag import (but not dag export).

  • dag import: Offline mode for import is hard-coded here
  • dag export: Export on the other hand, respects --offline flag: here

This means the problem of overfetching during dag import is mitigated already.

What remains are UX problems around --pin-roots=true being the default:

  • Consistency with dag put (which does not pin by default) removes surprises and unifies behavior of dag commands ("this is a low-level API, i need to explicitly say when I want to pin something")
  • Pinning should be explicit. Implicitly created low level pins are difficult to track and manage.
    Users doing curl | ipfs dag import end up with a lot of dangling DAGs they may not care about.
  • We will have more and more partial CARs flying around, and pinning by default will produce below error:
    $ curl -v 'http://127.0.0.1:8080/ipns/docs.ipfs.tech?format=car&dag-scope=entity' > ./partial-entity.car # Kubo 0.21.0-rc1 with IPIP-402
    $ ipfs dag import --stats --pin-roots=true ./partial-entity.car && echo "👉️ fake success lol 🙃"
    Error pinning	QmPDC11yLAbVw3dX5jMeEuSdk4BiVjSd9X87zaYRdVjzW3	FAILED: block was not found locally (offline): ipld: could not find QmPDvrDAz2aHeLjPVQ4uh1neyknUmDpf1GsBzAbpFhS8ro
    Imported 1 blocks (1618 bytes)
    👉️ fake success lol 🙃
    • 👉 Turns out, these pin errors produce false-positives on CLI — Kubo only prints error, but dag import command exit code is 0 (success), so users already are in the blind when pinning of root failed, unless they manually inspect stdout or JSON payload sent on the wire 🙃 🤷
    • I fixed exit code in fix(cmd): useful errors in dag import #9945 – can be reviewed on its own.

PR switching default to --pin-roots=false proposed in #9926 but I wonder if we should make breaking change there, or play it safe and add separate ipfs block import (see #9926 (comment))

lidel added a commit that referenced this issue Jun 14, 2023
* feat!: dag import - don't pin roots by default

Fixes: #9765

* test(ipip-402): dag import

this adds basic regression test that guards behavior
around partial cars with or without pinning

* docs(ipip-402): ipip and dag import changelog

---------

Co-authored-by: Marcin Rataj <lidel@lidel.org>
@lidel
Copy link
Member Author

lidel commented Jun 15, 2023

@Jorropo raised valid concerns around changing the default silently in #9926, which may lead to data loss (if someone assumed dag import always pins by default AND had GC enabled)

After my fixes the UX from #9926 (review) is good enough (correct exit code and human-readable error if recursive pinning failed), so we could flip the default back to --pin-roots=true.

So my ask would be to not revert that PR blindly (it adds important regression tests), but

@lidel lidel reopened this Jun 15, 2023
lidel pushed a commit that referenced this issue Jun 15, 2023
This is a partial revert of b685355.
Closes #9765 with compromise agreed in #9765 (comment)
hacdias pushed a commit that referenced this issue Jun 20, 2023
* feat!: dag import - don't pin roots by default

Fixes: #9765

* test(ipip-402): dag import

this adds basic regression test that guards behavior
around partial cars with or without pinning

* docs(ipip-402): ipip and dag import changelog

---------

Co-authored-by: Marcin Rataj <lidel@lidel.org>
hacdias pushed a commit that referenced this issue Jun 20, 2023
This is a partial revert of b685355.
Closes #9765 with compromise agreed in #9765 (comment)
hacdias pushed a commit that referenced this issue Jun 20, 2023
* feat!: dag import - don't pin roots by default

Fixes: #9765

* test(ipip-402): dag import

this adds basic regression test that guards behavior
around partial cars with or without pinning

* docs(ipip-402): ipip and dag import changelog

---------

Co-authored-by: Marcin Rataj <lidel@lidel.org>
hacdias pushed a commit that referenced this issue Jun 20, 2023
This is a partial revert of b685355.
Closes #9765 with compromise agreed in #9765 (comment)
@BigLep BigLep mentioned this issue Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/easy kind/bug A bug in existing code (including security flaws) need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up topic/commands Topic commands topic/pinning Topic Pinning (local and remote)
Projects
No open projects
Archived in project
Status: Done
4 participants