Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

paritydb support for parachains db. #4838

Merged
merged 30 commits into from
Mar 3, 2022

Conversation

cheme
Copy link
Contributor

@cheme cheme commented Feb 2, 2022

This PR remove rocksdb usage for users that are using paritydb.
Note that users that are currently using paritydb would need to purgedb and resync.

Few things that could be added/change:

  • Parity-db do not use the cache parameter. This ae177bc could add a simple keyvalue cache , not sure if needed, not even sure it would really help performance.

  • avstore meta an chain selection column are fully switched to ordered, 8b66d0a could also move content that requires ordering to their own collection (paritydb only) for better performances.

  • Using a specific trait for the database is not strictly mandatory, can remove it and just use plain KeyValueDb, but the trait could be use for future changes (keyvaluedb trait for instance misses errors for iterator).

  • write_lock may not be necessary, but I do not know polkadot internal enough to assume there is no concurrent writes (would be strange).

  • bridge code, I started updated it, but revert the changes in ccc0c8f as the content is not really up to date, probably recent code is in https://github.com/paritytech/parity-bridges-common

  • the directories are looking like this : chains/polkadot/paritydb/full for all substrate db file chains/polkadot/paritydb/parachains for all polkadot db files.
    So purge-db command in substrate should be extended with a --parachains arguments (see test).

node/service/src/lib.rs Outdated Show resolved Hide resolved
@arkpar
Copy link
Member

arkpar commented Feb 3, 2022

avstore meta an chain selection column are fully switched to ordered, for parity db we could keep part of the content unordered. This commit 8b66d0a propose add two column for paritydb to store only the content that requires ordering. This should be better for perf, but only really needed if content under AVAILABLE_PREFIX, CHUNK_PREFIX, META_PREFIX from avstore and BLOCK_ENTRY_PREFIX and LEAVES_KEY of chain selection is worth it. cc @ordian

In general it is better to separate different types of data into different columns and not invent key schemes. So +1 from me.

the directory is looking like this : chains/polkadot/paritydb/full for all substrate db file chains/polkadot/paritydb/parachains for all polkadot db files.
Currently purge dir is not purging as necessary (see TODO in test), I am considering paritytech/substrate@master...cheme:purge-dir (purge everything under chains/polkadot/paritydb or chains/polkadot/db).

I think it's fine if purge-db command does not delete all, but a specific database. It already had a --light distinction until recently.

polkadot purge-db 
polkadot purge-db --parachains

@cheme cheme added B1-releasenotes C1-low PR touches the given topic and has a low impact on builders. labels Feb 4, 2022
@cheme cheme marked this pull request as ready for review February 4, 2022 13:14
@cheme cheme added the A0-please_review Pull request needs code review. label Feb 4, 2022
@rphmeier
Copy link
Contributor

rphmeier commented Feb 18, 2022

Note that users that are currently using paritydb would need to purgedb and resync.

@cheme Can you add a warning or a migration or something? A note in a PR that nobody in the validator community reads is not enough to inform the community.

Many validators use paritydb already and will be caught by surprise if their node suddenly stops working. We can also provide a --use-old-parachains-db or something like that in the Polkadot CLI so the validators can sync a new node in the background.

@rphmeier rphmeier added A5-grumble and removed A0-please_review Pull request needs code review. labels Feb 18, 2022
@cheme
Copy link
Contributor Author

cheme commented Feb 18, 2022

Can you add a warning or a migration or something?

writing a migration sounds like a good idea (at least from rocksdb to paritydb it is easy), will do nw.

@shawntabrizi
Copy link
Member

There are labels for this like "migation needed" "breaks api" "breaks everything"

@arkpar
Copy link
Member

arkpar commented Feb 18, 2022

I don't think migration from rocksdb to paritydb is needed at this point. The whole point of introducing --db=auto is so that users are able to use existing database.

Note that users that are currently using paritydb would need to purgedb and resync.

@cheme Is that really the case? Changes in paritydb related to tree index support are backwards compatible with existing database format (version 5). Are they not?

@cheme
Copy link
Contributor Author

cheme commented Feb 19, 2022

I don't think migration from rocksdb to paritydb is needed at this point. The whole point of introducing --db=auto is so that users are able to use existing database.

Note that users that are currently using paritydb would need to purgedb and resync.

@cheme Is that really the case? Changes in paritydb related to tree index support are backwards compatible with existing database format (version 5). Are they not?

yes changes are backward compatible (untested but it was written to be).

At first I did implement 'auto' to keep using existing db, but I switch that off during review :(. I could revert (I remember testing the use case).
Migration looks pretty nice to have to me, but yes main question is should we force switch users.

@arkpar
Copy link
Member

arkpar commented Feb 19, 2022

This is how "auto" is supposed to work at the moment:

  1. For the main db: If a dir with paritydb exists - use it, otherwise create or open a rocksdb instance.
  2. For the parachain db: If paritydb instance exists - use it, otherwise use the same format as main db (as determined on the previous step)

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A node with paritydb synced successfully on Versi and an issue has been created for further improvements mentioned in this PR. So LGTM.

As for the migration, UIUC validators running with --db=auto will continue to have rocksdb as a backend for the parachain db, unless they purge the db (but the ones with --db=paritydb-experimental will create one from scratch). Whether they are advised to use auto or purge the parachains db should be clarified in the release notes.

@rphmeier
Copy link
Contributor

rphmeier commented Feb 21, 2022

but the ones with --db=paritydb-experimental will create one from scratch

That is, that they'll instantiate a new DB but won't require resync?

A node with paritydb synced successfully on Versi

I suspect that we should see how consensus behaves when all nodes use paritydb (it would be great to have this as a ZombieNet test, can you file an issue for that?)

@ordian
Copy link
Member

ordian commented Feb 21, 2022

That is, that they'll instantiate a new DB but won't require resync?

Yes, AFAIK, purging the parachains DB doesn't require a resync (of the main DB). But another test won't hurt: https://github.com/paritytech/devops/issues/1423#issuecomment-1046612572.

(it would be great to have this as a ZombieNet test, can you file an issue for that?)

#4952

@cheme
Copy link
Contributor Author

cheme commented Feb 21, 2022

cea49b3 did change 'auto' to not change db.
However if https://github.com/paritytech/devops/issues/1423#issuecomment-1046612572 runs well, it would make more sense to have 'auto' switching from rocksdb to paritydb when the main db is paritydb.

@ordian
Copy link
Member

ordian commented Feb 22, 2022

Yes, AFAIK, purging the parachains DB doesn't require a resync (of the main DB). But another test won't hurt: https://github.com/paritytech/devops/issues/1423#issuecomment-1046612572.

This was successfully tested.

However if https://github.com/paritytech/devops/issues/1423#issuecomment-1046612572 runs well, it would make more sense to have 'auto' switching from rocksdb to paritydb when the main db is paritydb.

Not sure what you mean? I think --db=paritydb-experimental should imply paritydb for both databases. Even if there's an existing rocksdb one (or two).

@ordian
Copy link
Member

ordian commented Feb 22, 2022

Many validators use paritydb already and will be caught by surprise if their node suddenly stops working.

Do we know how many? Because if they all start with empty parachains db and they are more than 1/3, it might cause some problems with finality.

Code-wise, this PR is good to go IMHO unless we hear some objections today.

@cheme
Copy link
Contributor Author

cheme commented Feb 22, 2022

Not sure what you mean? I think --db=paritydb-experimental should imply paritydb for both databases. Even if there's an existing rocksdb one (or two).

yes, the change I did (revert of cea49b3), revert to using parity-db for parachains whenever it is used as main database (even when using 'auto').

That was the conclusion of a conversation with @arkpar yesterday:

basically there is no sense in using two different database if switching is smooth.
Also if the switch fine now, it should be better to move as soon as possible (in case future changes would make the switch harder later).

On a different topic, I did implement a first version of remove prefix for paritydb and code is not as simple as I wanted, so I am not sure anymore if it is a good idea (paritytech/parity-db@master...cheme:remove_range).

@ordian ordian removed the A5-grumble label Feb 23, 2022
@ordian
Copy link
Member

ordian commented Feb 23, 2022

Let's wait for the results of https://github.com/paritytech/devops/issues/1423#issuecomment-1047852355 and then merge it if successful.

@arkpar
Copy link
Member

arkpar commented Mar 3, 2022

bot merge

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants