-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse Merkle Tree and helpers #77
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments, didn't go through any of the core methods yet.
Ready for Review EDIT: Added a class to manage Merkle Proofs in a more useful way. The intention would be to subclass this manager for different use cases (i.e. non-bytes types for keys and values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still sort of struggling to understand how this would be used. Maybe I missed it somewhere. Are there code examples that might be illustrative here?
@pipermerriam I assume you mean the Here is an example of me using it in practice: This class is listening to an address on the public network in order to track updates in an on-chain managed merkle tree. This contract doesn't store the full database in state, but communicates node updates to any party that iterates through the logs to synchronize their locally maintained branch to the contract. This saves some gas because the node updates are stored in logs vs. state (ephermality of logs is acceptable). The end result is that the user has a merkle proof they can submit to prove membership in that list and do things on- or off-chain. My use case requires this to generate a ZKP of membership in this on-chain list without leaking identity in order to authorize transactions. Another use case would be for Plasma Cash designs that have state-enabled checkpointing. When a user wants to exit a token in this design, they would need to upload a proof of membership of that token in the state tree tracked by the on-chain list of state root hashes such a design would manage. I could picture something similar being of utility for protocol layer clients. Note that I was mistaken that this functionality would be useful for general Plasma Cash designs as they only track the differences between blocks (e.g. the transaction root instead of the state root) |
Failure was because this project is using an old version of hypothesis (3.7.0 was Mar 2017, 3.82.1 is latest) |
height -> depth; cleaner keysize check Forgot height->depth in testcase Added consistency check with branch and root Moved calc_root to smt file, better as utility function for SMTs
Added branch merging functionality and tests, copied from merkle-sync Moved to hypothesis strategies Flake8 Added ability to set a default value, which does not fail existence check Algorithm was complicated and had incorrect results; changed to clearer algorithm and eliminated bug Updated name from nodes->node_updates to reflect how they are used; Added some comments Turned branch merging into a method of a merkle proof manager class for SMTs Linting issues Added test case that would have detected borrowing error Re-aligned the test case to capture the failure lint error Needed to make a deep copy to avoid error when using multiple Proof objects Updates per snakecharmers' style guide Type hints, test updates, a bug fix - Added type hints - Made some things that were lists into tuples - Added some extra validations on types and sizes - Needed to update some test cases for that - Found a bug in SparseMerkleProof.merge() Added docstrings for calc_root function Added docstrings for SparseMerkleProof class Commented out line with issue removed commented out line stemming from older hypothesis in production Updated to use more efficient data type Forgot to remove unused import
2008173
to
4f2e5f5
Compare
Merged all commits into 3. Everything looks good to go! |
Cool, taking a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, almost got through it before I had to take off. No huge problems, but I'll add an X since Piper already gave it a ✔️ , as a reminder that there are still a few things to go over.
I really think you're going to want to do the db like the other trie's do, to make it easy to have a disk-backed db.
Also, you'll probably want something like squash_changes
so that you don't flood the db with a bunch of intermediate state that you don't care about in the middle of a transaction:
Lines 444 to 450 in bce9e02
@contextlib.contextmanager | |
def squash_changes(self): | |
scratch_db = ScratchDB(self.db) | |
with scratch_db.batch_commit(do_deletes=self.is_pruning): | |
memory_trie = type(self)(scratch_db, self.root_hash, prune=True) | |
yield memory_trie | |
self.root_node = memory_trie.root_node |
So For example, if we are trying to optimize for network traffic, it might be better for the SMT to re-broadcast the key proof at every checkpoint. In this construction, each of those key changes and proofs would have to be calculated and broadcast. Usually, there are only important checkpoints at which the proof is relevant. Say at the end of a block. To play that through a little bit: Let's say there are 1000 account updates during the block. In the best case, In this example, you could be even smarter and send only the key siblings that changed, for a total of 32bytes * 1 level, for 1000x improvement. In fact, I think this optimization makes the solution always cost less than or equal to the network usage of the stream of key changes. Unrelated: also note that db size is growing without bound. You might consider using pruning like |
The Plasma use case and my intended use I think are sufficiently different that we should probably talk offline about optimizations and other attributes of this feature. The TL;DR is that it's less about efficient storage of the underlying data and more about having deterministic and cheap algorithms for verifying inclusion of data in a data set in something like a smart contract. I could be missing something though, so let's chat about it |
Most of my comment was about efficient network usage, not data storage. When you say "cheap algorithms" which thing are you optimizing for? |
Verification cost in gas |
Also, would really like to have a chat about the networking efficiency/data storage efficiency tradeoff. It makes sense in my head, but perhaps you can help me evaluate a better solution once I communicate the use case better. |
Sure, one thing that would help me is if you could explain why broadcasting proofs at intervals would be sub-optimal for your use case. (or if I am totally misunderstanding the context) |
Moving to gitter |
@carver I believe I resolved all of your outstanding comments that you noted should be solved in this version of the PR. Please let me know on here or gitter if you have further questions or comments. |
Okay, yes, I think it is mergeable. I'm still not totally satisfied with the name Of course, that's not even vaguely near the top of my priority heap right now. When the release notes come out, I'll also note this class as experimental, indicating that the API may change at minor versions. |
|
||
# If db is provided, and is not consistent, | ||
# there may be a silent error. Can't solve that easily. | ||
smt.db = db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder that we probably want the ability to initialize a passed-in database with empty node hashes (eg~ LevelDB).
* convert bash scripts to py
What was wrong?
I wanted #58 to have more features that are useful for things like Plasma, like the ability to return Merkle branches, the ability to set key size/tree height, and to return node updates from setting a key so that you could use them to broadcast updates to other clients (see: merkle sync)
Solves #65
Cute Animal Picture