Sparse Merkle Tree and helpers #77

fubuloubu · 2018-11-19T21:54:22Z

What was wrong?

I wanted #58 to have more features that are useful for things like Plasma, like the ability to return Merkle branches, the ability to set key size/tree height, and to return node updates from setting a key so that you could use them to broadcast updates to other clients (see: merkle sync)

Solves #65

Cute Animal Picture

pipermerriam

Just some comments, didn't go through any of the core methods yet.

trie/smt.py

fubuloubu · 2018-11-26T17:04:27Z

Ready for Review

EDIT: Added a class to manage Merkle Proofs in a more useful way. The intention would be to subclass this manager for different use cases (i.e. non-bytes types for keys and values)

pipermerriam

I'm still sort of struggling to understand how this would be used. Maybe I missed it somewhere. Are there code examples that might be illustrative here?

trie/smt.py

fubuloubu · 2018-11-28T22:45:03Z

@pipermerriam I assume you mean the SparseMerkleProof class (SMT should have pretty clear utility)

Here is an example of me using it in practice:
https://github.com/GunClear/PlasmaRifle/blob/9fc9c4ad6510b7f1b5a9fdcb2698c973d367002b/demo/get-branch.py#L56-L102

This class is listening to an address on the public network in order to track updates in an on-chain managed merkle tree. This contract doesn't store the full database in state, but communicates node updates to any party that iterates through the logs to synchronize their locally maintained branch to the contract. This saves some gas because the node updates are stored in logs vs. state (ephermality of logs is acceptable). The end result is that the user has a merkle proof they can submit to prove membership in that list and do things on- or off-chain. My use case requires this to generate a ZKP of membership in this on-chain list without leaking identity in order to authorize transactions.

Another use case would be for Plasma Cash designs that have state-enabled checkpointing. When a user wants to exit a token in this design, they would need to upload a proof of membership of that token in the state tree tracked by the on-chain list of state root hashes such a design would manage. I could picture something similar being of utility for protocol layer clients. Note that I was mistaken that this functionality would be useful for general Plasma Cash designs as they only track the differences between blocks (e.g. the transaction root instead of the state root)

fubuloubu · 2018-11-29T04:46:18Z

Failure was because this project is using an old version of hypothesis (3.7.0 was Mar 2017, 3.82.1 is latest)

tests/test_smt.py

trie/smt.py

tests/test_constants.py

tests/test_smt.py

trie/smt.py

height -> depth; cleaner keysize check Forgot height->depth in testcase Added consistency check with branch and root Moved calc_root to smt file, better as utility function for SMTs

Added branch merging functionality and tests, copied from merkle-sync Moved to hypothesis strategies Flake8 Added ability to set a default value, which does not fail existence check Algorithm was complicated and had incorrect results; changed to clearer algorithm and eliminated bug Updated name from nodes->node_updates to reflect how they are used; Added some comments Turned branch merging into a method of a merkle proof manager class for SMTs Linting issues Added test case that would have detected borrowing error Re-aligned the test case to capture the failure lint error Needed to make a deep copy to avoid error when using multiple Proof objects Updates per snakecharmers' style guide Type hints, test updates, a bug fix - Added type hints - Made some things that were lists into tuples - Added some extra validations on types and sizes - Needed to update some test cases for that - Found a bug in SparseMerkleProof.merge() Added docstrings for calc_root function Added docstrings for SparseMerkleProof class Commented out line with issue removed commented out line stemming from older hypothesis in production Updated to use more efficient data type Forgot to remove unused import

fubuloubu · 2018-12-05T01:46:38Z

Merged all commits into 3. Everything looks good to go!

carver · 2018-12-06T00:25:31Z

Cool, taking a look!

carver

Ok, almost got through it before I had to take off. No huge problems, but I'll add an X since Piper already gave it a ✔️ , as a reminder that there are still a few things to go over.

I really think you're going to want to do the db like the other trie's do, to make it easy to have a disk-backed db.

Also, you'll probably want something like squash_changes so that you don't flood the db with a bunch of intermediate state that you don't care about in the middle of a transaction:

py-trie/trie/hexary.py

Lines 444 to 450 in bce9e02

    
           @contextlib.contextmanager 
        
           def squash_changes(self): 
        
               scratch_db = ScratchDB(self.db) 
        
               with scratch_db.batch_commit(do_deletes=self.is_pruning): 
        
                   memory_trie = type(self)(scratch_db, self.root_hash, prune=True) 
        
                   yield memory_trie 
        
               self.root_node = memory_trie.root_node

tests/test_smt.py

tests/test_constants.py

tests/test_smt.py

trie/smt.py

carver · 2018-12-06T22:39:41Z

So squash_changes reminded me that there are often a lot of trie changes, even during a single transaction. Which brings up an interesting point: how well does this streaming optimization work, and in what parameters? (and what is the key metric to optimize for?)

For example, if we are trying to optimize for network traffic, it might be better for the SMT to re-broadcast the key proof at every checkpoint.

In this construction, each of those key changes and proofs would have to be calculated and broadcast. Usually, there are only important checkpoints at which the proof is relevant. Say at the end of a block. To play that through a little bit:

Let's say there are 1000 account updates during the block. In the best case, len(key_siblings) == 1, meaning all changes are on the other half of the trie. Then there are 1k updates on the order of 32 bytes (ignoring key and value), for a total of 32kb traffic per block. If the SMT broadcasts the new proof for a given key at every block, then it sends (worst case) 32bytes * 256 levels or about 1/4 of the network load.

In this example, you could be even smarter and send only the key siblings that changed, for a total of 32bytes * 1 level, for 1000x improvement. In fact, I think this optimization makes the solution always cost less than or equal to the network usage of the stream of key changes.

Unrelated: also note that db size is growing without bound. You might consider using pruning like HexaryTrie(is_pruning=True) works and have the SparseMerkleProof prune the data along the way, so that your memory/disk doesn't grow unbounded.

fubuloubu · 2018-12-06T22:56:41Z

The Plasma use case and my intended use I think are sufficiently different that we should probably talk offline about optimizations and other attributes of this feature.

The TL;DR is that it's less about efficient storage of the underlying data and more about having deterministic and cheap algorithms for verifying inclusion of data in a data set in something like a smart contract. I could be missing something though, so let's chat about it

carver · 2018-12-07T00:32:57Z

it's less about efficient storage of the underlying data and more about having deterministic and cheap algorithms for verifying inclusion of data in a data set in something like a smart contract.

Most of my comment was about efficient network usage, not data storage. When you say "cheap algorithms" which thing are you optimizing for?

fubuloubu · 2018-12-07T00:35:02Z

When you say "cheap algorithms" which thing are you optimizing for?

Verification cost in gas

fubuloubu · 2018-12-07T00:44:32Z

Also, would really like to have a chat about the networking efficiency/data storage efficiency tradeoff. It makes sense in my head, but perhaps you can help me evaluate a better solution once I communicate the use case better.

carver · 2018-12-07T00:46:21Z

Sure, one thing that would help me is if you could explain why broadcasting proofs at intervals would be sub-optimal for your use case. (or if I am totally misunderstanding the context)

fubuloubu · 2018-12-07T00:47:51Z

Moving to gitter

fubuloubu · 2018-12-07T19:51:49Z

@carver I believe I resolved all of your outstanding comments that you noted should be solved in this version of the PR. Please let me know on here or gitter if you have further questions or comments.

carver · 2018-12-11T01:17:23Z

Okay, yes, I think it is mergeable. I'm still not totally satisfied with the name branch and some of the other notes. Since you'd rather stop talking about it, I may decide to change it later without consulting you. 😝

Of course, that's not even vaguely near the top of my priority heap right now.

When the release notes come out, I'll also note this class as experimental, indicating that the API may change at minor versions.

carver · 2018-12-11T01:26:59Z

trie/smt.py

+
+        # If db is provided, and is not consistent,
+        # there may be a silent error. Can't solve that easily.
+        smt.db = db


Reminder that we probably want the ability to initialize a passed-in database with empty node hashes (eg~ LevelDB).

* convert bash scripts to py

Moved the 'sanity checks' to their own testcase

de6cef6

fubuloubu changed the title ~~WIP: Sparse merkle tree~~ WIP: Sparse Merkle Tree Nov 19, 2018

pipermerriam reviewed Nov 19, 2018

View reviewed changes

trie/smt.py Outdated Show resolved Hide resolved

trie/smt.py Outdated Show resolved Hide resolved

trie/smt.py Show resolved Hide resolved

fubuloubu changed the title ~~WIP: Sparse Merkle Tree~~ Sparse Merkle Tree and helpers Nov 20, 2018

pipermerriam reviewed Nov 28, 2018

View reviewed changes

trie/smt.py Show resolved Hide resolved

trie/smt.py Outdated Show resolved Hide resolved

trie/smt.py Outdated Show resolved Hide resolved

trie/smt.py Outdated Show resolved Hide resolved

trie/smt.py Outdated Show resolved Hide resolved

trie/smt.py Outdated Show resolved Hide resolved

fubuloubu commented Nov 29, 2018

View reviewed changes

tests/test_smt.py Outdated Show resolved Hide resolved

fubuloubu commented Nov 29, 2018

View reviewed changes

trie/smt.py Show resolved Hide resolved

pipermerriam approved these changes Dec 4, 2018

View reviewed changes

tests/test_constants.py Outdated Show resolved Hide resolved

tests/test_smt.py Outdated Show resolved Hide resolved

trie/smt.py Show resolved Hide resolved

fubuloubu added 2 commits December 4, 2018 20:41

Added smt class and some basic testing

63d3977

height -> depth; cleaner keysize check Forgot height->depth in testcase Added consistency check with branch and root Moved calc_root to smt file, better as utility function for SMTs

fubuloubu force-pushed the sparse-merkle-tree branch from 2008173 to 4f2e5f5 Compare December 5, 2018 01:44

carver requested changes Dec 6, 2018

View reviewed changes

Added updates per PR review

4311b6e

carver reviewed Dec 6, 2018

View reviewed changes

trie/smt.py Show resolved Hide resolved

fubuloubu added 3 commits December 6, 2018 19:06

Added a comment about keysize

d09ec9f

Better way of ensuring the key_size maps to the size of the keys chosen

5b98947

Lint issues

5d64978

carver approved these changes Dec 11, 2018

View reviewed changes

carver reviewed Dec 11, 2018

View reviewed changes

carver merged commit 9b9aac3 into ethereum:master Dec 11, 2018

This was referenced Dec 28, 2018

Implement Merkle Proof Generation and Validation API #65

Closed

Update binary trie with new data structure #37

Closed

fubuloubu deleted the sparse-merkle-tree branch March 19, 2019 18:52

pacrob added a commit to pacrob/py-trie that referenced this pull request May 12, 2023

convert bash scripts to py (ethereum#77)

180777d

* convert bash scripts to py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse Merkle Tree and helpers #77

Sparse Merkle Tree and helpers #77

fubuloubu commented Nov 19, 2018 •

edited

Loading

pipermerriam left a comment

fubuloubu commented Nov 26, 2018

pipermerriam left a comment

fubuloubu commented Nov 28, 2018 •

edited

Loading

fubuloubu commented Nov 29, 2018

fubuloubu commented Dec 5, 2018

carver commented Dec 6, 2018

carver left a comment

carver commented Dec 6, 2018

fubuloubu commented Dec 6, 2018

carver commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

carver commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

carver commented Dec 11, 2018

carver Dec 11, 2018

	@contextlib.contextmanager
	def squash_changes(self):
	scratch_db = ScratchDB(self.db)
	with scratch_db.batch_commit(do_deletes=self.is_pruning):
	memory_trie = type(self)(scratch_db, self.root_hash, prune=True)
	yield memory_trie
	self.root_node = memory_trie.root_node

Sparse Merkle Tree and helpers #77

Sparse Merkle Tree and helpers #77

Conversation

fubuloubu commented Nov 19, 2018 • edited Loading

What was wrong?

Cute Animal Picture

pipermerriam left a comment

Choose a reason for hiding this comment

fubuloubu commented Nov 26, 2018

pipermerriam left a comment

Choose a reason for hiding this comment

fubuloubu commented Nov 28, 2018 • edited Loading

fubuloubu commented Nov 29, 2018

fubuloubu commented Dec 5, 2018

carver commented Dec 6, 2018

carver left a comment

Choose a reason for hiding this comment

carver commented Dec 6, 2018

fubuloubu commented Dec 6, 2018

carver commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

carver commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

fubuloubu commented Dec 7, 2018

carver commented Dec 11, 2018

carver Dec 11, 2018

Choose a reason for hiding this comment

fubuloubu commented Nov 19, 2018 •

edited

Loading

fubuloubu commented Nov 28, 2018 •

edited

Loading