Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve large data that was split across multiple deals #77

Open
bajtos opened this issue Jun 19, 2024 · 0 comments
Open

Retrieve large data that was split across multiple deals #77

bajtos opened this issue Jun 19, 2024 · 0 comments

Comments

@bajtos
Copy link
Member

bajtos commented Jun 19, 2024

Quoting from https://filecoinproject.slack.com/archives/C02T827T9N0/p1718714837853519?thread_ts=1718710948.121159&cid=C02T827T9N0

So the way the pipeline works is as follows:

  • Solana validator nodes writes out a large CAR file (~550GB) with a root payload cid.
  • This CAR file is then split into smaller CAR files of 32GB size -> this is done without actually respecting the DAG structure, but just taking 1 CID at a time.
  • Each of the smaller CAR files are sent to SPs to make deals.

So this means to capture the full dag we necessarily need to onboard all the CAR files to SPs that will serve retrievals, otherwise this breaks.

also

This is why it works:
https://github.com/filecoin-project/data-prep-tools/blob/main/docs/best-practices/car-first-then-split.md

Example CIDs from that thread:

  • PieceCID baga6ea4seaqc7zmjswf4esktbdapqnzwanphet3g2ekppezzypodimfga6xiila
  • Payload CIDs:
    • bafyreih4wq2ljuzhnn6pzl7tny7khzekqjx7yp6h5rvfbx2hrwrtp6mpcq
    • bafyreiedsc7cjq7ypu4fwmnrrttdquewlpgnpirjaubf5rrqexl4fnbeem
    • bafyreib3ik6guatvlcpy2oq7fgiavs6nbsc4i7ormq3gex4www53sdb364
    • bafyreihp4ocqnpbdeywmcpbukeutbrj3x5v2truzh4fkmyelninv6abfoa
    • bafyreihjk6o5jelb2nyvaltgor6rql5fac7lfucxnczegjibxzlubsfj5i
    • bafyreifazskco4pkbopnxeouanepbq26suth2gml6ufepcx4tcqq3j4deq
@bajtos bajtos changed the title Retrieve large data split across multiple deals Retrieve large data that was split across multiple deals Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant