Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Making IPFS accessible for distributed archival. #210

Open
meyerzinn opened this issue Jan 2, 2017 · 11 comments
Open

Making IPFS accessible for distributed archival. #210

meyerzinn opened this issue Jan 2, 2017 · 11 comments

Comments

@meyerzinn
Copy link

At the Climate Mirror project, we're looking to use IPFS for distributing and archiving climate data. However, the computing power we possess is not enough to ensure the availability of the data. I've had the great pleasure to talk with @flyingzumwalt about the applicability of IPFS to the Climate Mirror project, but one of the key things I need is to make helping accessible to everybody and their dog. The main thing we need for our use-case:

  1. Because most people don't have 2TB drives casually laying around, we need to allow people to host subsets of the overall data. We want to use ipfs-ringpin to offer a "climate" pin list, but if someone has 2GB available for the project, it should determine the rarest blocks out of all the pins and fetch those up to capacity. This "if not enough space, get the rarest" is part of what I'd consider an "archival" mode in the ipfs client--p2p for the purpose of archiving, not for streaming a movie.

Thank you for building the internet of the future, now let's store some climate data :)

@flyingzumwalt
Copy link

cc @hsanjuan @jbenet @whyrusleeping this is an interesting use case for ipfs-cluster. Does anyone know of an easy way to inspect the dht in order to identify the "rarest" blocks from a list of hashes? Is it feasible?

@meyerzinn
Copy link
Author

@flyingzumwalt I think this fits ipfs-ringpin better, but cluster is useful for the team resources.

@flyingzumwalt
Copy link

Also - @20zinnm makes a good point that we should figure out the relationship between ipfs-ringpin and ipfs-cluster

@meyerzinn
Copy link
Author

meyerzinn commented Jan 2, 2017

My understanding thus far is that ipfs-cluster allows for distribution of a pin list across nodes in the cluster, like one pin list that multiple machines coordinate to fulfill, whereas ringpin lets you copy someone else's pin list for yourself, like some sort of IPFS celebrity social media.

EDIT: Accidentally closed.

@meyerzinn meyerzinn reopened this Jan 2, 2017
@hsanjuan
Copy link
Contributor

hsanjuan commented Jan 3, 2017

if someone has 2GB available for the project, it should determine the rarest blocks out of all the pins and fetch those up to capacity

Currently this is not a usecase supported by ipfs-ringpin or ipfs-cluster, but ipfs-cluster aims to support that sort of pinning. We have a specific user-story on pin rings too: ipfs-cluster/ipfs-cluster#7 too. It would be super useful if you can add more comments that help us shape this feature.

I should add that you could probably use ipfs-ringpin publish (using ipns to publish a list of published content) with cluster. The main advantage is that cluster would automatically pin new stuff, rather than asking all nodes to refresh their lists.

@meyerzinn
Copy link
Author

@hsanjuan can people join and leave the cluster with rebalancing? And how connected do clusters need to be (within the same network? same country?)

@hsanjuan
Copy link
Contributor

hsanjuan commented Jan 3, 2017

@20zinnm cluster should implement rebalancing at some point yes.

Connected peers can just be in the same places IPFS peers are..

@meyerzinn
Copy link
Author

meyerzinn commented Jan 3, 2017

@hsanjuan so is it possible to make an "open" cluster where people can "join" the cluster and help redundantly store the pins? If we can develop a mechanism to determine the rarity of blocks, cluster nodes could be assigned blocks based on how rare they are and how reliable a node has been? I was thinking of a cluster ledger via a blockchain, but it could also be authoritative (whichever node is oldest is the leader).

By "open" I mean anyone on the internet.

@flyingzumwalt
Copy link

Another issue relevant to this discussion: @jbenet's proposal of an ipfs-pack format and associated tooling for datasets on ipfs: #205

@flyingzumwalt
Copy link

Also see ipfs-cluster/ipfs-cluster#7

@hsanjuan
Copy link
Contributor

hsanjuan commented Jan 8, 2017

@20zinnm It would be great if you can open an issue in ipfs-cluster (or use ipfs-cluster/ipfs-cluster#7) explaining how your dream-tool would seem and we come up with a list of implementable features. Cluster uses a consensus algorithm (raft) to keep a consistent vision of what is pinned across all nodes. There are challenges in making participating nodes dynamic, and in discovering and rebalancing content which is not pinned by enough nodes (specially if those nodes are showing up and going randomly). But we can go little by little in building that functionality and your ideas and feedback would be appreciated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants