Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: internal block store for backfilling newly-deployed indexers #990

Open
deekerno opened this issue Jun 7, 2023 · 3 comments
Open
Assignees

Comments

@deekerno
Copy link
Contributor

deekerno commented Jun 7, 2023

From a message I sent in a discussion on Slack about #932:

Here’s what I would consider the “gold standard”:
0. Have an internal table/DS that holds BlockData somehow, so we don’t have to hit the client again.

  1. Re-enable/fix forc index revert.
  2. Upon re-deploying an indexer with saved data, inform the user that the tables from the previous version have been renamed for backup purposes, and if needed, the old indexer and its tables can be restored through the use of forc index revert.
  3. Begin backfilling from the internal block table.
  4. Once the last persisted block is added, set the indexer’s executor to request for blocks starting with last_persisted_block + 1.
  5. In the case that there is another re-deployment, we then do the following:
  • Rename the already-renamed tables to a temp table name
  • Rename the currently used tables to the backup name
  • Delete the tables that are now two versions old.
  • Go to step 3.

In short, we should consider having a local store for blocks. I feel the main benefits of this feature are three-fold:

  • Upon deploying an indexer (or re-deploying an existing one), we could instantly begin to process blocks from the local store, which should decrease the amount of time that is needed for an indexer to fully index the chain (or at least index from their desired start_block) as we would avoid the latency from network requests.
  • Decrease the amount of requests to a Fuel node when an indexer is (re-)deployed. Currently, an indexer will begin to request blocks starting at the value of start_block in its manifest. With several indexers running at once, the total traffic for these requests starts to add up, and as we've seen in Fuel node resets connection to indexer #979, a Fuel node deployment may have strategies in place to rate-limit traffic.
  • It would also allow for clean migration from one version of an indexer to another, which is a feature that was brought up some time ago in feature: index migrations #382. A user would just upload a new version of an indexer and the data is re-indexed according to their schema and handler code; it would also allow for the workflow described in the message above.

Concerns for this approach:

  • How exactly do we store the data? This would essentially be a duplication of what the client does, which uses RocksDb as far as I know, and that's not exactly known as the lightest dependency.
  • We currently make client requests per indexer executor; we would need to ensure that adding to this local block store would be free from contention and data races.
  • The blockchain will grow infinitely large and so too will the space needed to save blocks.
@ra0x3
Copy link
Contributor

ra0x3 commented Jun 7, 2023

  • Will add more info a bit later, but I think this could be really important, especially with regards to migrating data
    • E.g., I deploy the same indexer with a different schema, and I want that not only use the new index going forward, but also (in parallel?) want to backfill, potentially to the genesis block 🩴

@ra0x3
Copy link
Contributor

ra0x3 commented Sep 20, 2023

@lostman Will this issue be handled by #1150 ?

@lostman
Copy link
Contributor

lostman commented Sep 21, 2023

@ra0x3, no, that's only for missing blocks. Initially both were handled by a single PR: #1297 but I split the missing blocks out. Missing blocks will be merged first and I'm bringing #1297 up to date to reflect this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants