Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migration("re-indexing"), backfilling and diasgnostics tooling for the ChainIndexer #12450

Open
wants to merge 10 commits into
base: feat/msg-eth-tx-index
Choose a base branch
from

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Sep 11, 2024

ChainIndexer Migration and Diagnostics Tooling

This PR implements the "migration" (really re-indexing / backfilling), and diagnostics tooling for the ChainIndexer implemented in PR #12450, and is part of the work for #12453. This tooling takes the form of both RPC APIs on the daemon and lotus-shed CLI commands.

Re-indexing Process

The re-indexing tool enables clients to index their entire existing ChainState in the ChainIndexer. This process is necessary due to the removal of the existing MsgIndex, EthTxIndex, and EventIndex from Lotus.

Why Re-index Instead of Migrate?

We've chosen to re-index rather than migrate data from existing indices for two primary reasons:

  1. Known issues: The existing indices have multiple known problems, and migrating could perpetuate incorrect index entries.
  2. Lack of garbage collection: Existing indices contain many entries for which the corresponding tipset messages/events no longer exist in the ChainStore due to splitstore GC.

Instead, we're re-indexing the Chainstore/Chainstate on the node into the ChainIndexer. This ensures that all re-indexed entries have gone through the indexing logic of the new ChainIndexer and that the Index is in sync/reflects the actual contents of the Chainstore/Chainstate post re-indexing.

Diagnostics Tooling

This PR introduces diagnostic tools for detecting corrupt Index entries at specific epochs or epoch ranges.

While this PR implements functionality for optionally backfilling missing Index entries, it does not yet include the capability to "repair" corrupted Indexed entries. The repair functionality will be introduced in a subsequent PR. This approach allows us to first gather and analyze user reports, helping us understand the types and causes of corrupted Indexed entries(and if all they exist in the new ChainIndexer) before implementing repair mechanisms.

Core API

The fundamental building block for this tooling is the following RPC API:

type IndexValidation struct {
	TipsetKey string
	Height    uint64

	TotalMessages  uint64
	TotalEvents    uint64
	EventsReverted bool

	Backfilled bool
}

func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error)

This API has the following features:

  • Optionally backfills the Index with a tipset on the canonical chain for the given epoch if it is absent in the Index
  • Returns some aggregated stats for an indexed entry for diagnostics/inspection
  • Reports errors/corrupted indexed entries at the given epoch. Forms of Index corruption that can be diagnosed includes:
    • Presence of multiple non-reverted tipsets at the given epoch
    • Complete absence of a non-reverted tipset at the given epoch that does contain reverted tipsets
    • Mismatch between the Chainstore state and the Indexed entries (tipset messages/events)
    • Incorrect Indexing of null rounds at the given epoch

lotus-shed CLI tooling

The lotus-shed CLI tooling for both re-indexing/backfilling and diagnostics can then invoke this RPC API over epoch ranges. The corresponding lotus-shed backfill index [from, to] and lotus-shed inspect index [from, to] can then backfill/inspect/diagnose the Index for the given epoch ranges.

TODO

  • lotus-shed commands
  • RPC API should compare the Indexed entries with the chain store to detect a mismatch between indexed content and chain store content
  • Fill out all the diagnostics information in the RPC response by queying the messages and events tables in the Index.
  • automated tests

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chainindex/api.go Outdated Show resolved Hide resolved
return &types.IndexValidation{
TipsetKey: ts.Key().String(),
Height: uint64(ts.Height()),
Backfilled: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill out the other fields here using SQL queries.

chain/types/tipset.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title [WIP] Chain Index Validation API Chain Index Validation API Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992
Copy link
Contributor Author

@rvagg Would be great to have a first round of review here when you get the time.

github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Chain Index Validation API Migration and Diasnostics tooling for the ChainIndexer Sep 12, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 aarshkshah1992 changed the title Migration and Diasnostics tooling for the ChainIndexer Migration("Re-Indexing") and Inspection tooling for the ChainIndexer Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Migration("Re-Indexing") and Inspection tooling for the ChainIndexer Migration("Re-Indexing") and Diasgnostics tooling for the ChainIndexer Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 self-assigned this Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Migration("Re-Indexing") and Diasgnostics tooling for the ChainIndexer feat: migration("re-indexing") and diasgnostics tooling for the ChainIndexer Sep 12, 2024
node/builder_chain.go Outdated Show resolved Hide resolved
itests/eth_filter_test.go Outdated Show resolved Hide resolved
chain/types/index.go Outdated Show resolved Hide resolved
@@ -220,6 +233,9 @@ func (si *SqliteIndexer) Apply(ctx context.Context, from, to *types.TipSet) erro
}
si.closeLk.RUnlock()

si.writerLk.Lock()
defer si.writerLk.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe unlock before notifyUpdateSubs?

@@ -337,6 +353,9 @@ func (si *SqliteIndexer) Revert(ctx context.Context, from, to *types.TipSet) err
}
si.closeLk.RUnlock()

si.writerLk.Lock()
defer si.writerLk.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on unlocking before notifying subs?

&ps.getMinNonRevertedHeightStmt: "SELECT MIN(height) FROM tipset_message WHERE reverted = 0",
&ps.hasNonRevertedTipsetStmt: "SELECT EXISTS(SELECT 1 FROM tipset_message WHERE tipset_key_cid = ? AND reverted = 0)",
&ps.updateEventsToRevertedStmt: "UPDATE event SET reverted = 1 WHERE message_id IN (SELECT message_id FROM tipset_message WHERE tipset_key_cid = ?)",
&ps.updateEventsToNonRevertedStmt: "UPDATE event SET reverted = 0 WHERE message_id IN (SELECT message_id FROM tipset_message WHERE tipset_key_cid = ?)",
&ps.getMsgIdForMsgCidAndTipsetStmt: "SELECT message_id FROM tipset_message WHERE tipset_key_cid = ? AND message_cid = ? AND reverted = 0",
&ps.insertEventStmt: "INSERT INTO event (message_id, event_index, emitter_addr, reverted) VALUES (?, ?, ?, ?)",
&ps.insertEventStmt: "INSERT INTO event (message_id, event_index, emitter_addr, reverted) VALUES (?, ?, ?, ?) ON CONFLICT (message_id, event_index) DO UPDATE SET reverted = 0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some kind of test that exercises this and demonstrates it working well would be nice, also does the query plan show it behaving nicely? I can imagine this getting a bit weird matching for conflicts

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg

  • Will be writing a test for this.

  • While sqliite does not generate query plans for INSERT STATEMENTS, the query plan for an equivalent query says this will be performant as we have a covering index for (message_id, event_index)

/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN
INSERT INTO event (message_id, event_index, emitter_addr, reverted)
SELECT 1, 2, 'a', 0
WHERE NOT EXISTS (
    SELECT 1 FROM event
    WHERE message_id = 1 AND event_index = 2
);"
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 1
   `--SEARCH event USING COVERING INDEX sqlite_autoindex_event_1 (message_id=? AND event_index=?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we have an UNIQUE constraint on (message_id, event_index)

chain/index/api.go Outdated Show resolved Hide resolved
return revertedCount, nonRevertedCount, nil
}

func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did this all in a single transaction, would that remove the need for a write lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, transaction provides atomicity, not exclusivity. Imagine this scenario:

  1. Backfill is called for epoch e -> it reads tipset ts at that epoch
  2. Revertexecutes for ts and marks the tipset as reverted in TX1
  3. Backfill executes for ts and marks it as applied in TX2

TX1 is committed first and then TX2 is comitted.

Basically, it's hard to reason about state here without this lock.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg

One work-around here it to only allow backfilling for "safe" epochs i.e. <=head - 30.

Then, the rewrite can only ever happen in an edge case.

chain/index/api.go Outdated Show resolved Hide resolved
chain/index/api.go Outdated Show resolved Hide resolved
chain/index/ddls.go Outdated Show resolved Hide resolved

// fetch the non-reverted tipset at this epoch
var indexedTsKeyCidBytes []byte
err = si.stmts.getNonRevertedTipsetAtHeightStmt.QueryRowContext(ctx, epoch).Scan(&indexedTsKeyCidBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the LIMIT 1 below if you're only expecting one result

}

return &types.IndexValidation{
TipsetKey: expectedTs.Key().String(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below, I think just the key would be good here, also note capitalisation of TipSetKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return &types.IndexValidation{
TipsetKey: ts.Key().String(),
Height: uint64(ts.Height()),
Backfilled: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: other fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* feat: finish todos of validation api

* feat: add indexed data verification with chain store

* feat: address comments and finish TODO

* fix: build issue

* address comments

* fix: ci issue
@aarshkshah1992
Copy link
Contributor Author

Thanks for all your work here @akaladarshi 👍

NonRevertedEventsCount uint64
Backfilled bool
IndexedMessagesCount uint64
IndexedEventsCount uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this to IndexedNonRevertedMsgCount and IndexedNonRevertedEventsCount to be precise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akaladarshi That is an implementation detail that will confuse users. They're only asking for the indexed entries at a specific epoch here and so ofcourse expect only non-reverted tipsets back.

aarshkshah1992 and others added 4 commits September 18, 2024 15:16
* feat: add lotus-shed command for backfilling chain indexer

* feat: add lotus-shed command for inspecting the chain indexer

* feat: use single lotus-shed command to inspect and backfill

* fix: remove the unused queries

* small changes

* add change log
@aarshkshah1992 aarshkshah1992 marked this pull request as ready for review September 19, 2024 09:10
@aarshkshah1992
Copy link
Contributor Author

@BigLep This is now ready for review. Working on a user guide for this tooling as discussed offline.

@aarshkshah1992
Copy link
Contributor Author

@rvagg This is now in a good shape and has been tested on a calibnet node. Please take a look when you get the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌨️ In Progress
Development

Successfully merging this pull request may close these issues.

4 participants