Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink state storage #3639

Closed
3 of 4 tasks
dimartiro opened this issue Dec 12, 2023 · 1 comment
Closed
3 of 4 tasks

Rethink state storage #3639

dimartiro opened this issue Dec 12, 2023 · 1 comment

Comments

@dimartiro
Copy link
Contributor

dimartiro commented Dec 12, 2023

Issue summary

We need to rethink our current state storage implementation, not only to improve it but also to comply with the standards required by the protocol. This issue presents our current solution in contrast with Parity's solution, highlighting the changes we should make along with the benefits we would gain.

Current solution

Currently, our state storage solution, used among other things by the runtime, has various responsibilities:

  1. Apply state changes for both tries and child tries (put, set, delete, clear).
  2. Handle transactions (begin transaction, commit, rollback).
  3. Get tries from memory.
  4. Get tries from the database.

The protocol specifies that the state is a collection of keys and values Polkadot spec. This state must be re-arranged in a trie to calculate the merkle hash and to generate and verify merkle proofs. But internally, we always store the state in the form of a trie when this is only necessary when calculating the hash or generating proofs.

In our case, our state storage implementation get tries from memory and, if not found, loads them by looking up in the DB.
In order to do that we need to feed the storage structure with a DB connection and the in memory tries structure.

To handle the state transitions with transactions we use our current TrieState structure that contains 2 in memory tries used as transient tries to handle the changes and apply them if it is needed.

Here is a simplified diagram of our architecture:

Parity's solution

Parity built a trie crate that simplifies the basic tasks of a tree. The difference with our solution is that they do not store the data in the form of a trie but rather store it in a key-value database, and they have a TrieDB wrapper with trie functions over this database but do not have a trie built in memory at all times.

Each TrieDB represents a trie along with its root node and all its nodes, allowing the same database to be shared for all the tries that need to be represented.

This represents various benefits.

Benefits

Simpler and more scalable

Our current solution looks more like spaghetti code relating different components in a way that is very difficult to follow and understand. It makes the solution complex to understand and could raise issues in the future if we do the wrong changes.

Parity's solution is more straightforward, easy to understand and scalable.

It allows us to implement lazy loading

With our current solution the only lazy loading behavior we have is when the state storage is loading tries from the database and putting them into memory.
We have many problems with that:

  1. It works for tries but not for nodes, when we are loading a trie from the DB we are decoding the entire trie with all its nodes when we only need to use only one node value.
  2. It stores the decoded trie into memory "forever" because we don't have a caching mechanism
  3. Even if we implement the cache it will work for tries but not for nodes so the saved memory will be not as much as we can gain lazy loading nodes

If we want to implement lazy loading of nodes with our current solution we will have to change the way a node is decoded to prevent decoding its childs and we will going to have some repeated code with different behaviors for tries and nodes.

If we implement a solution similar than Parity's trie crate we will have Tries with a similar behavior than Parity's TrieDB and we are going to load nodes from database and decoding them only if we need their values, with the ability to cache the most frequently used in a simple manner

Reduce memory usage

Since we are going to have everything in database and only the most frequently used nodes / values in memory we are going to reduce our memory usage significantly.

A way to solve this using Parity's crate is having a local nodes / values cache for every trie so any trie could be able to cache its nodes independently solving this issue #3623

We stop thinking about trees and start thinking about the state as keys and values.

If we use this solution we are going to start saving a collection of key value pairs instead of saving tries in memory, the "Trie" will be an abstract structure on top of a database that will be re arranging this key value pairs collection into a trie only when it is needed.

In the past, was difficult to implement things like trie V1 because our internal implementation is too couple with the Trie structure, this change of mindset will help us to decouple the data from the representation and we could be free to change the Trie structure in the future with less risks of breaking something.

We can implement nested transactions more easily

We are going to stop using our current TrieState implementation to handle transactions and we could do it in the same TrieDB simplifying the transaction management as well as our architecture.

That will help us to solve this issue #3508

Necessary Changes

I think the best way to implement this is like we did with Grandpa. We have to translate the parity create to golang and then start using it in our client.

We already have a PR that we can use as starting point #3390

Other information and links

Parity's Trie Crate

TrieDB

https://github.com/paritytech/trie/blob/d3e63773ed044a709f719bd6455b9ab1751aae7c/trie-db/src/triedb.rs#L120

Substrate Cache

https://github.com/paritytech/polkadot-sdk/blob/d43d2fe23b12355eed4afa6dd3e971a248269262/substrate/primitives/trie/src/cache/mod.rs#L18

Related issues

@dimartiro
Copy link
Contributor Author

Moved to this discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant