Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Merkle Tree and helpers #77

Merged
merged 7 commits into from
Dec 11, 2018
Merged

Conversation

fubuloubu
Copy link
Contributor

@fubuloubu fubuloubu commented Nov 19, 2018

What was wrong?

I wanted #58 to have more features that are useful for things like Plasma, like the ability to return Merkle branches, the ability to set key size/tree height, and to return node updates from setting a key so that you could use them to broadcast updates to other clients (see: merkle sync)

Solves #65

Cute Animal Picture

baby walrus

@fubuloubu fubuloubu changed the title WIP: Sparse merkle tree WIP: Sparse Merkle Tree Nov 19, 2018
Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments, didn't go through any of the core methods yet.

trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Show resolved Hide resolved
@fubuloubu fubuloubu changed the title WIP: Sparse Merkle Tree Sparse Merkle Tree and helpers Nov 20, 2018
@fubuloubu
Copy link
Contributor Author

Ready for Review

EDIT: Added a class to manage Merkle Proofs in a more useful way. The intention would be to subclass this manager for different use cases (i.e. non-bytes types for keys and values)

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still sort of struggling to understand how this would be used. Maybe I missed it somewhere. Are there code examples that might be illustrative here?

trie/smt.py Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
@fubuloubu
Copy link
Contributor Author

fubuloubu commented Nov 28, 2018

@pipermerriam I assume you mean the SparseMerkleProof class (SMT should have pretty clear utility)

Here is an example of me using it in practice:
https://github.com/GunClear/PlasmaRifle/blob/9fc9c4ad6510b7f1b5a9fdcb2698c973d367002b/demo/get-branch.py#L56-L102

This class is listening to an address on the public network in order to track updates in an on-chain managed merkle tree. This contract doesn't store the full database in state, but communicates node updates to any party that iterates through the logs to synchronize their locally maintained branch to the contract. This saves some gas because the node updates are stored in logs vs. state (ephermality of logs is acceptable). The end result is that the user has a merkle proof they can submit to prove membership in that list and do things on- or off-chain. My use case requires this to generate a ZKP of membership in this on-chain list without leaking identity in order to authorize transactions.

Another use case would be for Plasma Cash designs that have state-enabled checkpointing. When a user wants to exit a token in this design, they would need to upload a proof of membership of that token in the state tree tracked by the on-chain list of state root hashes such a design would manage. I could picture something similar being of utility for protocol layer clients. Note that I was mistaken that this functionality would be useful for general Plasma Cash designs as they only track the differences between blocks (e.g. the transaction root instead of the state root)

@fubuloubu
Copy link
Contributor Author

Failure was because this project is using an old version of hypothesis (3.7.0 was Mar 2017, 3.82.1 is latest)

tests/test_smt.py Outdated Show resolved Hide resolved
trie/smt.py Show resolved Hide resolved
tests/test_constants.py Outdated Show resolved Hide resolved
tests/test_smt.py Outdated Show resolved Hide resolved
trie/smt.py Show resolved Hide resolved
height -> depth; cleaner keysize check

Forgot height->depth in testcase

Added consistency check with branch and root

Moved calc_root to smt file, better as utility function for SMTs
Added branch merging functionality and tests, copied from merkle-sync

Moved to hypothesis strategies

Flake8

Added ability to set a default value, which does not fail existence check

Algorithm was complicated and had incorrect results; changed to clearer algorithm and eliminated bug

Updated name from nodes->node_updates to reflect how they are used; Added some comments

Turned branch merging into a method of a merkle proof manager class for SMTs

Linting issues

Added test case that would have detected borrowing error

Re-aligned the test case to capture the failure

lint error

Needed to make a deep copy to avoid error when using multiple Proof objects

Updates per snakecharmers' style guide

Type hints, test updates, a bug fix

- Added type hints
- Made some things that were lists into tuples
- Added some extra validations on types and sizes
- Needed to update some test cases for that
- Found a bug in SparseMerkleProof.merge()

Added docstrings for calc_root function

Added docstrings for SparseMerkleProof class

Commented out line with issue

removed commented out line stemming from older hypothesis in production

Updated to use more efficient data type

Forgot to remove unused import
@fubuloubu
Copy link
Contributor Author

Merged all commits into 3. Everything looks good to go!

@carver
Copy link
Contributor

carver commented Dec 6, 2018

Cool, taking a look!

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, almost got through it before I had to take off. No huge problems, but I'll add an X since Piper already gave it a ✔️ , as a reminder that there are still a few things to go over.

I really think you're going to want to do the db like the other trie's do, to make it easy to have a disk-backed db.

Also, you'll probably want something like squash_changes so that you don't flood the db with a bunch of intermediate state that you don't care about in the middle of a transaction:

py-trie/trie/hexary.py

Lines 444 to 450 in bce9e02

@contextlib.contextmanager
def squash_changes(self):
scratch_db = ScratchDB(self.db)
with scratch_db.batch_commit(do_deletes=self.is_pruning):
memory_trie = type(self)(scratch_db, self.root_hash, prune=True)
yield memory_trie
self.root_node = memory_trie.root_node

tests/test_smt.py Show resolved Hide resolved
tests/test_constants.py Show resolved Hide resolved
tests/test_smt.py Outdated Show resolved Hide resolved
tests/test_smt.py Show resolved Hide resolved
tests/test_smt.py Outdated Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Show resolved Hide resolved
trie/smt.py Outdated Show resolved Hide resolved
trie/smt.py Show resolved Hide resolved
@carver
Copy link
Contributor

carver commented Dec 6, 2018

So squash_changes reminded me that there are often a lot of trie changes, even during a single transaction. Which brings up an interesting point: how well does this streaming optimization work, and in what parameters? (and what is the key metric to optimize for?)

For example, if we are trying to optimize for network traffic, it might be better for the SMT to re-broadcast the key proof at every checkpoint.

In this construction, each of those key changes and proofs would have to be calculated and broadcast. Usually, there are only important checkpoints at which the proof is relevant. Say at the end of a block. To play that through a little bit:

Let's say there are 1000 account updates during the block. In the best case, len(key_siblings) == 1, meaning all changes are on the other half of the trie. Then there are 1k updates on the order of 32 bytes (ignoring key and value), for a total of 32kb traffic per block. If the SMT broadcasts the new proof for a given key at every block, then it sends (worst case) 32bytes * 256 levels or about 1/4 of the network load.

In this example, you could be even smarter and send only the key siblings that changed, for a total of 32bytes * 1 level, for 1000x improvement. In fact, I think this optimization makes the solution always cost less than or equal to the network usage of the stream of key changes.


Unrelated: also note that db size is growing without bound. You might consider using pruning like HexaryTrie(is_pruning=True) works and have the SparseMerkleProof prune the data along the way, so that your memory/disk doesn't grow unbounded.

@fubuloubu
Copy link
Contributor Author

The Plasma use case and my intended use I think are sufficiently different that we should probably talk offline about optimizations and other attributes of this feature.

The TL;DR is that it's less about efficient storage of the underlying data and more about having deterministic and cheap algorithms for verifying inclusion of data in a data set in something like a smart contract. I could be missing something though, so let's chat about it

@carver
Copy link
Contributor

carver commented Dec 7, 2018

it's less about efficient storage of the underlying data and more about having deterministic and cheap algorithms for verifying inclusion of data in a data set in something like a smart contract.

Most of my comment was about efficient network usage, not data storage. When you say "cheap algorithms" which thing are you optimizing for?

@fubuloubu
Copy link
Contributor Author

When you say "cheap algorithms" which thing are you optimizing for?

Verification cost in gas

@fubuloubu
Copy link
Contributor Author

Also, would really like to have a chat about the networking efficiency/data storage efficiency tradeoff. It makes sense in my head, but perhaps you can help me evaluate a better solution once I communicate the use case better.

@carver
Copy link
Contributor

carver commented Dec 7, 2018

Sure, one thing that would help me is if you could explain why broadcasting proofs at intervals would be sub-optimal for your use case. (or if I am totally misunderstanding the context)

@fubuloubu
Copy link
Contributor Author

Moving to gitter

@fubuloubu
Copy link
Contributor Author

@carver I believe I resolved all of your outstanding comments that you noted should be solved in this version of the PR. Please let me know on here or gitter if you have further questions or comments.

@carver
Copy link
Contributor

carver commented Dec 11, 2018

Okay, yes, I think it is mergeable. I'm still not totally satisfied with the name branch and some of the other notes. Since you'd rather stop talking about it, I may decide to change it later without consulting you. 😝

Of course, that's not even vaguely near the top of my priority heap right now.

When the release notes come out, I'll also note this class as experimental, indicating that the API may change at minor versions.


# If db is provided, and is not consistent,
# there may be a silent error. Can't solve that easily.
smt.db = db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder that we probably want the ability to initialize a passed-in database with empty node hashes (eg~ LevelDB).

@carver carver merged commit 9b9aac3 into ethereum:master Dec 11, 2018
@fubuloubu fubuloubu deleted the sparse-merkle-tree branch March 19, 2019 18:52
pacrob added a commit to pacrob/py-trie that referenced this pull request May 12, 2023
* convert bash scripts to py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants