Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

payments for file data service #4

Open
neriumrevolta opened this issue Oct 25, 2023 · 5 comments
Open

payments for file data service #4

neriumrevolta opened this issue Oct 25, 2023 · 5 comments
Assignees
Labels
meta:triaged This issue has been triaged (has a good description, as well as labels for priority, size and type) type:tracking Tracking issues with related scope

Comments

@neriumrevolta
Copy link

No description provided.

@neriumrevolta neriumrevolta changed the title (spike) payments for file data service payments for file data service Oct 25, 2023
@hopeyen hopeyen self-assigned this Oct 26, 2023
@hopeyen
Copy link
Collaborator

hopeyen commented Oct 30, 2023

Goal

Imagine a decentralized file-sharing market supporting

  • off-chain micropayments for verifiable data transfers,
  • handle files of size in gigabytes or terabytes
  • allow parallel chunked/partial downloads of requested files
  • on-chain escrows and dispute processes

Our primary contenders for file transfer protocols are 1) HTTP direct transfer, 2) Torrents, and 3) IPFS.

TL;DR comparison table

Aspect Decentralized HTTPS with TLS in Rust Torrent with Micropayments in Rust IPFS with Payments in Rust Best Option
Protocol Architecture Client-server model, extended to decentralized servers for partial downloads. Peer-to-peer architecture, ideal for distributed file sharing with chunked data. Decentralized, content-addressable network, suitable for distributed sharing. Torrent for inherit P2P nature
Speed and Performance High for small to medium files, moderate for very large files due to HTTP overhead. High, optimized for large files, efficient in distributing bandwidth. Moderate, dependent on network state and data availability. Torrent for its efficiency with large files and bandwidth distribution.
Interleaved Micropayments Require a payment verification system for range download request. Require a payment verification system for chunked data transfer. Challenging due to the need for integrating payments into a decentralized system with partial file access. HTTPS for middleware flexibility
Escrow (collateralization) and Trust Requires robust external mechanisms for trust and escrow. Requires robust external mechanisms for trust and escrow. Requires robust external mechanisms for trust and escrow. All 3 are similar in needing a blockchain/subgraph client.
Data Integrity and Security High with TLS, but dependent on server integrity. High, with built-in mechanisms for data verification. Moderate, relies on network integrity and node trustworthiness. Torrent for its robust data verification mechanisms. Need to separately implement verification for HTTPS
Verification Processes Standard HTTPS verification, extended for decentralized servers, require verification of partial data. Inherent in the protocol, with additional layers for payment verification. Requires additional verification aside from content addressable ID. Torrent for its inherent and efficient verification processes.
User Experience (Servers/Clients) Familiar for the users, decentralized network aspect adds complexity. Familiar in P2P context, micropayments add a layer of complexity. Less familiar, requires understanding of decentralized systems and payments. HTTPS for its familiarity and ease of use.
Matching Algorithms Requires development of algorithms for server-client matching based on QoS and price. Inherent in the protocol, but needs extension for price-based matching. Complex, requires innovative matching algorithms in a decentralized market. Similar
Library Maturity (Rust) High. Established libraries for HTTP and TLS. Low. Leecher clients available; Seeder clients and micropayments integration less mature. Low to Moderate. Growing but less mature than HTTP libraries. HTTPS for its mature and robust libraries in Rust.
Implementation Complexity (Rust) Moderate. Leveraging existing HTTPS libraries; Payments and matching algorithm adds complexity. High. Integration of Torrent protocol with micropayments is complex. High. Combining IPFS with payment systems is innovative but complex. HTTPS for simpler implementation in Rust.
Community Support (Rust) Strong. Active development in HTTP/TLS within the Rust community. Low. Limited support for Torrents with micropayments. Growing. Increasing interest in decentralized systems like IPFS. HTTPS for strong community support in Rust.
Innovation Potential (Rust) Moderate. Extension of existing protocols. High. Novel approach integrating micropayments with Torrents. Very High. Cutting-edge concept in file sharing. IPFS for highest innovation potential in Rust.

Decentralized HTTPS with TLS is the most practical and reliable choice, excelling in library maturity in Rust, flexibility, ease of development and usage. Meanwhile Torrent with Micropayments offers a balance of innovation and complexity, provides a novel solutions in file sharing, and IPFS with Payments is the most innovative and flexible with higher complexity and a less mature ecosystem.

More details

HTTPS File Transfer with TLS

A secure method of transferring files over the internet, utilizing the standard HTTP protocol combined with TLS encryption. This approach ensures that data transferred between the client and server is encrypted, safeguarding against interception and unauthorized access. It supports partial downloads, allowing clients to request specific ranges of a file, which is particularly useful for large files. This method is widely used due to its robust security features, compatibility with existing web infrastructure, and ease of implementation in a variety of contexts, including web browsers and standalone applications.

Architecture

  • Decentralized HTTPS Servers: Multiple independent entities host HTTPS servers, each storing a combination of files or parts of files.
  • Client-Driven File Assembly: Clients request file chunks from various servers, possibly using a directory or indexing service to locate these chunks.
  • No Central Control: The absence of a central server or authority managing file distribution leads to a resilient and censorship-resistant system. Services could be routed through gateways, DHT, or direct connections.
  • On-chain escrow and availability, off-chain micropayments: Utilize staking and dispute mechanisms to protect user assets. Micropayments reduce trust requirements.

Data Transfer and Integrity

  • Secure Partial Downloads: HTTPS supports partial downloads securely, allowing clients to request specific ranges of a file.
  • Chunk Verification: Downloaded chunks can be verified for integrity using checksums hashes, ensuring data reliability.
  • Adaptive Server Selection: Require an automated service for the clients to dynamically request downloads from servers based on factors like response time, chunk availability, and server security credentials.

Scalability

  • Efficient Large File Handling: Parallel downloads from multiple servers enhance scalability for large files.
  • Distributed Network Load: Given an intelligent matching algorithm, the network scales well as adding more servers doesn't significantly increase complexity.
  • Redundancy and Availability: Multiple copies of file chunks across different servers improve availability and resilience.

Security

  • TLS Encryption: HTTPS with TLS provides robust encryption for data transfers, protecting against eavesdropping and man-in-the-middle attacks.
  • Server Authentication: TLS also involves server authentication, ensuring that clients are communicating with legitimate servers.
  • Trust and Reputation System: A system to evaluate the trustworthiness of servers can help maintain a secure and reliable network.

User Experience

  • Simplified User Interface: Despite the underlying complexity, the system should offer an easy-to-use interface for posting and downloading files.
  • Consistent Reliability: Users should be able to download files reliably, as long as at least 1 server is available.
  • Optimized Speed: Parallel downloads from multiple servers can offer faster download speeds compared to single-source downloads.
  • Legal and Regulatory Considerations: The system must ensure that it does not facilitate the unauthorized distribution of copyrighted material.

Summary

  • Uses client-server model, typically centralized, but we can utilize the partial download, verifiable chunks, and automated matching/selection algorithm to implement a decentralized market
  • TLS brings robust encryption and server authentication to provide data protection for users
  • The decentralized nature of the system promotes resilience and censorship resistance.
  • Critical to maintain data integrity and ensuring legal compliance.
  • Straightforward usage for the users, intuitive to implement middleware for payments and verifications.
  • Advantages: relatively straightforward to implement middleware for on-chain verifications, off-chain payment and data verifications, native partial download and encryption support, commonly used across the web
  • Concerns: Implement a reasonable algorithm for matching chunks between the servers

Torrent Protocol

Inherently peer-to-peer and decentralized, stands out as a strong candidate. Its efficiency in distributing large files through chunked downloads aligns well with the requirement of handling terabyte-sized data. Require adaptation to include micropayments for each chunk transfer, matching algorithm that pairs multiple peers for serving and requesting files, and using on-chain escrow mechanisms. This approach aims to enhance the efficiency and availability of file sharing by financially rewarding seeders, potentially leading to a scalable and censorship resistant p2p network.

Architecture

  • Peer-to-Peer Network: The Torrent protocol operates on a peer-to-peer (P2P) network where each participant (peer) can act as both a downloader (leecher) and uploader (seeder).
  • Client modifications: Clients must be able to send receipts for data chunks, and receive and verify receipts before sending the requested data chunk.
  • Torrent Files and Trackers: Torrent files contain metadata about the files to be shared. Trackers help peers find each other with a matching algorithm considering prices and QoS.
  • Distributed File Sharing: There's no central server hosting the entire file; instead, files are divided into chunks and distributed across multiple peers.

Data Transfer and Integrity

  • Chunk-Based Transfer: Files are divided into smaller chunks, which are downloaded independently from different peers, allowing for efficient data transfer.
  • Data Verification: Each chunk is verified using cryptographic hashes, ensuring the integrity and reliability of the data.
  • Swarm Dynamics: The health of a file's swarm (the number of seeders and leechers) can significantly impact download speeds and availability.

Scalability

  • Handling Large Files: Torrents are well-suited for distributing large files, as the burden of data transfer is distributed across many peers.
  • Network Resilience: Depending on the number of participants, the decentralized nature of torrents can make the network resilient to single points of failure.
  • Dynamic Participation: The network can scale dynamically as peers join and leave, with minimal coordination overhead.

Security

  • Encryption: Many Torrent clients support optional encryption to obfuscate traffic, although this is not a built-in feature of the protocol itself.
  • Trust Mechanisms: Trust is often managed through community-driven approaches, here we consider minimal trust requirement on chunk data transfers, data verifiability for certain file types, and on-chain escrow contracts.
  • Vulnerability to Malicious Activity: Torrents are susceptible to risks like fake or malicious files, requiring users to be cautious on the file hash they request for.

User Experience

  • Client Software: Must implement specialized Torrent client software to download and upload files.
  • Variable Download Speeds: Speeds can vary greatly depending on the swarm's health and the user's network configuration.
  • Community Engagement: Many Torrent communities have active forums and guides, may lead to a rich user experience.
  • Legal issues: Torrents are often associated with the sharing of copyrighted material without authorization, leading to legal challenges. Some regions specifically monitor torrent activities.

Summary

The Torrent protocol offers a robust and scalable solution for decentralized file sharing, particularly effective for large files due to its distributed nature and chunk-based transfer system. While it excels in data transfer efficiency and network resilience, it faces challenges in security, trust management, and legal compliance. The user experience can vary widely based on the health of the torrent swarm and the community around it.

IPFS with paid access

Adapting from IPFS as a content-addressable P2P storage network, we consider a version incorporating a payment system for accessing files, where users pay to retrieve data from other nodes in the network. The integration of payments aims to incentivize the hosting and distribution of files, potentially improving the availability and reliability of data within the IPFS network. IPFS does not natively support micropayments or file transfer payments. Integrating such a system into IPFS would be complex, requiring external layers or applications to handle payments and collateralization. Additionally, IPFS's typical use case involves public files, and adapting it for private, paid transfers of partial files would be challenging. The protocol may struggle with very large files, which is a critical requirement in this scenario.

Architecture

  • Decentralized Storage: IPFS operates on a decentralized network where each node stores a portion of the overall data, eliminating reliance on central servers.
  • Content Addressing: Files in IPFS are accessed via content-based addressing rather than location-based, using unique hashes for each file or chunk.
  • Payment-Integrated Protocol: IPFS doesn't natively support payments to access files. The system should integrate a payment mechanism for accessing file chunks to incentivizing nodes to store and serve data.

Data Transfer and Integrity

  • Chunk-Based Access and Payment: Users pay for each chunk of data they access, aligning costs directly with usage.
  • Data Verification: The integrity of each chunk is ensured through cryptographic hashing, guaranteeing that the data retrieved is accurate and unaltered.
  • Efficient Data Retrieval: IPFS's use of a distributed hash table (DHT) facilitates efficient retrieval of data chunks from the nearest or most convenient nodes.

Scalability

  • Distributed Nature: The decentralized, peer-to-peer architecture of IPFS naturally supports scalability, as adding more nodes enhances the network's capacity.
  • Load Distribution: Data is distributed across numerous nodes, preventing any single point of overload and ensuring balanced load distribution.
  • Dynamic Network Adaptation: The network can adapt to changing conditions, such as varying node availability and network demand.

Security

  • Data Encryption: While IPFS itself doesn't inherently encrypt data, additional layers of encryption can be implemented for secure data transfer.
  • Secure Payments: Integrating TAP and blockchain clients for payments ensures secure and transparent financial transactions.
  • Access Control: Mechanisms can be developed to control who can access and pay for specific data chunks, enhancing privacy and security.

User Experience

  • Ease of Access: Users can easily retrieve data from the nearest nodes, potentially improving access speed and reliability.
  • Transparent Payment System: The payment system for accessing data chunks should be straightforward and user-friendly, with clear pricing and transaction processes.
  • Community and Support: A strong community and good documentation are essential for user support and engagement.
  • Legal issues: Ensuring that IPFS does not facilitate unauthorized sharing of copyrighted material is a significant challenge.

Potential next step

Put hands on doing a POC of HTTP file transfers, explore if there are unforeseen difficulties

@chriswessels
Copy link
Member

Great job here @hopeyen! Agreed with the comparison of the options and the relative strengths/weaknesses. Also agree that we should implement an HTTP PoC next.

I have left some questions that we should be considering here: https://www.notion.so/graphops/Subfile-Service-a59a801b27094e4589cebd52a081ca5f?pvs=4

@hopeyen
Copy link
Collaborator

hopeyen commented Nov 21, 2023

Next step:

  • Server manage cost model, start off with 1 pricing per byte. Later steps can be subfiles specific with complex packaged pricing

Reach out with TAP's team

  • Clients build a receipt
  • Server parse for receipts
  • Server runs TAP manager to verify and store receipts
  • Server event trigger for RAV redemption

@hopeyen
Copy link
Collaborator

hopeyen commented Dec 18, 2023

More concrete plan:

Subfile server

  1. create a file to track (indexer address and indexer url)
  2. upload the file to IPFS
  3. create an allocation signer
  4. compute allocation id over IPFS file containing indexer url - uniqueAllocationID(indexerMnemonic: string,epoch: number,deployment: SubgraphDeploymentID, existingIDs: Address[])
  5. generate allocationIdProof using allocation signer, allocation id, and indexer address
  6. Staking contract allocate using ```this.network.contracts.staking.populateTransaction.allocateFrom(
    indexer: this.network.specification.indexerOptions.address,
    subgraphDeploymentID: deployment.bytes32,
    tokens: amount,
    allocationID: allocationId,
    metadata: utils.hexlify(Array(32).fill(0)),
    proof,)
  7. all subfiles served at the indexer url should be now available for discovery and queries. Indexers can update the subfiles list without closing/reallocating. Indexers must renew allocations by max lifetime or to update indexer url
  8. close allocation ``` await this.network.contracts.staking.populateTransaction.closeAllocation(
    allocationID: string
    poi: BytesLike
    }
  9. reallocate ReallocateTransactionParams { closingAllocationID: string poi: BytesLike indexer: string subgraphDeploymentID: BytesLike tokens: BigNumberish newAllocationID: string metadata: BytesLike proof: BytesLike }

Subfile clients

  • Read Allocations through the network subgraph
  • Resolve Allocations to SubgraphDeployment IPFS for a list of indexer urls
  • Find indexer address
  • Receipt signer - escrow contracts: deposit(address receiver, uint256 amount)
  • create receipts
  • Include as part of query range request

@hopeyen
Copy link
Collaborator

hopeyen commented Jan 29, 2024

Protocol V1 vs Horizon

Today during system architect office hours, I asked about how will data service be incorporated in Horizon; what will be the new design for service registry and staking contracts.

I understood that (though nothing is set in stone yet)

  • Service registry can potentially take on an additional parameter for service types so an indexer can have a separate endpoint for each service if they choose.
  • Staking contract will likely still track allocations, for all types of data services and per deployment. An allocation is strictly limited to indicating query services and collecting query fees, and not indexing rewards.
  • Arbitration is up in the air. For file hosting service, it is relatively easy to create a dispute against an indexer's response for a chunk, even though shouldn't be necessary to do so - the consumer can submit the chunk data and signature from the indexer; an arbitrator can recover the indexer address, hash the chunk and verify against the File manifest. This arbitration process can be easily automated. However, we are aiming to utilize micropayments for individual chunks with incremental verification, such that a conflict is discovered as soon as a chunk is received and halt the transfer with the sender; a dispute wouldn't be economically efficient for a single chunk. We can potentially explore dispute for incorrect metadata in the manifests against the file content.
  • We must assume some cost for the indexer to register their endpoint, in particular a cost higher than the value of a single chunk payment (or better a few magnitudes greater), such that an indexer cannot easy recover their registration cost if they are not capable of serving the data when receiving a paid chunk request.

Approach 1
Indexers register their public_url against the service registry. If it is a dedicated registry then the number of tries is limited to those who specifically opted-in to service registry, otherwise there must be a filter for indexers serving file endpoints. Indexer opens up allocations for a particular file or a bundle of files.

From the network subgraph, client can
-> queries for all active allocations against a deployment hash representing their target file.
-> reads for all available indexer endpoints.
-> reads active allocations from all indexers, grabs the underlying deployment hash, and filters for the target file.

This approach should be easily migrated to Horizon, as service registry should then support multiple entries per indexer across data services, and allocations will no longer be correlated with indexing rewards but still indicate service for query fees.

Approach 2
Indexers do not separately register their url, and indexers do not open allocations against individual files. Indexers create and publish a file containing their file server url, and allocate against the file.
Indexers are free to update their file serving status at any point without a need to notify the network, but clients will not be able to rely on the network subgraph, instead, they must go through all active allocations for a file that contains a file server url; or they obtain url from an indexer off-chain.
-> This decreases the cost for indexers as they can be more flexible with their service without board-casting on-chain, but increases the runtime requirement for the client to discover indexers' target file availability. Optionally, indexers can periodicially gossip their file serving status, and gossip nodes (listener-radio-esque) can store and update the status and serve an API endpoint for the client to check for availabilty.

This approach might or might not be easily migrated to Horizon, has minimal on-chain activity, doesn't require staking for a particular file, but not secured economically.

@0xpili 0xpili added type:tracking Tracking issues with related scope meta:triaged This issue has been triaged (has a good description, as well as labels for priority, size and type) labels Feb 20, 2024
@hopeyen hopeyen mentioned this issue May 6, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta:triaged This issue has been triaged (has a good description, as well as labels for priority, size and type) type:tracking Tracking issues with related scope
Projects
None yet
Development

No branches or pull requests

4 participants