Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SMT Proofs (Inclusion and Exclusion) #648

Merged
merged 67 commits into from
Mar 12, 2024
Merged

Conversation

bvrooman
Copy link
Contributor

@bvrooman bvrooman commented Dec 20, 2023

Related issues:

This PR:

  • Adds support for generating SMT proofs (inclusion and exclusion)
  • Adds support for verifying SMT proofs

Given an SMT, we can generate a proof for any key within the 32 byte key space. If the key is included in the tree, i.e., has been used in an update operation and not deleted, then we generate an inclusion proof. If the key has not been included in the tree, we generate an exclusion proof. Inclusion proofs verify that the given key-value is, in fact, in the tree. If the value undergoing verification does correspond to the given key, verification fails. Exclusion proofs verify that the key is a placeholder. If the given key is not a placeholder in the tree, verification fails.

Example

Consider the following example:

We have a key-value data set, upon which we are building a sparse Merkle tree.
The data set is structured as such:

Key Value
000 42
010 "Foo"
110 "Bar"

The data set contains 3 key values. Because the key space supports up to 8 keys (000 - 111), we know that there are 5 unused slots in the key space.

For each unused slot, we know that the value is unset. Therefore, the complete key-value data set looks like this:

Key Value
000 42
001
010 "Foo"
011
100
101
110 "Bar"
111

The corresponding sparse Merkle tree is constructed on the key-value data set:

exmaple_smt

Example Use Case

We can make statements about the data set. For example:

  • The key 000 has value 42 (inclusion). This statement is true.
  • The key 111 is excluded from the table (exclusion). This statement is true.
  • The key 100 is empty (exclusion). This statement is true.
  • The key 110 has value "Hello" (inclusion). This statement is false (the correct value is "Bar").
  • The key 110 is empty (exclusion). This statement is false (the key has value "Bar").
  • The key 101 has value "Foo" (inclusion). This statement is false (the key is empty).

We can categorize these statements into one of two categories:

  1. I assert the key K has the value V (inclusion).
  2. I assert the key K is unset (exclusion).

Assertion 1 is an assertion of knowledge: I assert that key K has value V. 1. For assertion 2, an equivalent assertion is: I assert that key K is not in the table.

For these assertions, we can generate cryptographic proofs. Respectively, these are proofs of inclusion and proofs of exclusion. Verification is the process of confirming a given statement using the proof. Verifying a statement and proof returns a boolean (true or false) output that can confirm the corresponding statement. If verification of the statement returns true, the statement has been proven. If verification of the statement returns false, the statement has not been proven.

Verification failure does not imply the opposite statement. For example, we cannot assert that a key is in the table without providing a specific value, i.e. "key K is included in the table (with some unknown value V)" using a failed exclusion assertion. If verification of the statement "key K has a zero value" returns false, it is possible that K has a non-zero value and is included in the tree; however, a failed proof of exclusion does not necessarily imply inclusion. This is because the proof provided to the verification can be somehow invalid or maliciously altered.

More concretely:

  • $V_{inclusion}(A_{key}, A_{value} = X, P) = true \implies A_{key} \in T, T[A_{key}] = X$
  • $V_{exclusion}(A_{key}, P) = true \implies A_{key} \notin T$
  • $V_{inclusion}(A_{key}, A_{value} = X, P) = false \centernot\implies A_{key} \notin T, T[A_{key}] \neq X$ (Proof may be invalid!)
  • $V_{exclusion}(A_{key}, P) = false \centernot\implies A_{key} \in T$

where:

  • $V$ is the verifier.
  • $A$ is the key-value pair in the statement undergoing verification.
  • $P$ is a proof that proves the statement undergoing verification
  • $T$ is the table of key-value data

Proofs that fail to validate do not imply the opposite statement.

@xgreenx xgreenx linked an issue Dec 20, 2023 that may be closed by this pull request
@bvrooman bvrooman changed the title Bvrooman/feat/smt proofs feat: SMT Proofs (Inclusion and Exclusion) Dec 24, 2023
@bvrooman bvrooman self-assigned this Dec 24, 2023
@bvrooman bvrooman marked this pull request as ready for review December 24, 2023 00:50
@bvrooman bvrooman requested a review from a team December 24, 2023 00:50
#[test]
fn generate_proof_and_verify_with_valid_key_value_returns_true((key_values, tree) in random_tree()) {
let mut rng = StdRng::seed_from_u64(0xBAADF00D);
if let Some((key, value)) = key_values.choose(&mut rng).cloned() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will always choose the same index. You could provide the random index in the test declaration.

The simplest way to do that would probably be just add a arb_num: usize and then in the code: let index = arb_num % key_values.len(), or something like that.

Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is bad that all these functions panics.

image

If these functions are for internal purposes only, they should be private. But since we are exposing the Node type, maybe we want to rework how they are used.

Or, instead of exposing Node types, you can create another type for the ExclusionProof.

leaf,
} = self;
let key = key.into();
let mut current = *leaf.hash();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost fixed=D The problem is that getter leaf_key has an internal assert. So if someone provides a node, it will fail.

Added a test case with a panic=)

I think if the node is not a placeholder or a leaf, we need to return false. Overwise the proof provider may try to trick us=)

pub struct ExclusionProof {
pub root: Bytes32,
pub proof_set: ProofSet,
pub(crate) leaf: Node,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be public and Node type as well. Otherwise, it is not possible to transfer this object between applications. We need to also implement serialization and deserialization somehow=)

Also, it would be nice if we implement serde on top of that

@bvrooman
Copy link
Contributor Author

bvrooman commented Mar 7, 2024

Almost fixed=D The problem is that getter leaf_key has an internal assert. So if someone provides a node, it will fail.

Added a test case with a panic=)

I think if the node is not a placeholder or a leaf, we need to return false. Overwise the proof provider may try to trick us=)

Yep, that's a good point.

The asserts should be debug_assert. I can fix that.
I will also add a check that the node in the proof is a leaf node/placeholder, and if not, return false.

I moved the test case you wrote into the proof.rs tests and made it deterministic, since we cannot guarantee that loading a node from the proof set will succeed in random a test setup.

I think it should be public and Node type as well. Otherwise, it is not possible to transfer this object between applications. We need to also implement serialization and deserialization somehow=)

Also, it would be nice if we implement serde on top of that

Maybe it makes sense to store a subset of node data in the proof, rather than the whole node? E.g., just leaf_key: Bytes32 and leaf_data: Bytes32, and we reconstruct the leaf from this using something like Node::new(0 /*height*/, Prefix::Leaf, leaf_key, leaf_data).

This way, we:

  • can easily implement serialization/deserialization on the proof structs (without also implementing it for Node)
  • ensure that there is no way for a user to provide an internal node (Prefix::Node) that breaks code like above
  • do not need to expose the Node structure further

edit:

Or, instead of exposing Node types, you can create another type for the ExclusionProof.

I think this you are suggesting something similar to what I am describing: a "subset type" of Node that can be used for just the ExclusionProof

@bvrooman
Copy link
Contributor Author

Re: serde for proofs: #694

@bvrooman
Copy link
Contributor Author

This PR is ready for the next round of reviews.

I'm leaving serde for a separate task, because that will require changes that are not in scope of proofs.

Comment on lines 197 to 215
pub fn leaf_key(&self) -> &Bytes32 {
assert!(self.is_leaf());
debug_assert!(self.is_leaf());
self.bytes_lo()
}

pub fn leaf_data(&self) -> &Bytes32 {
assert!(self.is_leaf());
debug_assert!(self.is_leaf());
self.bytes_hi()
}

pub fn left_child_key(&self) -> &Bytes32 {
assert!(self.is_node());
debug_assert!(self.is_node());
self.bytes_lo()
}

pub fn right_child_key(&self) -> &Bytes32 {
assert!(self.is_node());
debug_assert!(self.is_node());
self.bytes_hi()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think public getters should return Option instead.

For internal usage, you can create a private version of this function. Maybe even it is better to mark them unsafe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node is public only to the fuel-merkle crate, i.e. pub(crate) enum Node. The only place Node should ever be used is in the context of the MerkleTree, where we already know what kind of node we are using. There is no reason for it to be used in a context where we do not know the kind of node, such as user space or another crate, which is why it is marked pub(crate). Therefore, I do not agree that these should return Option - we always know what data we are accessing, i.e, we could always unwrap the Option; we would never need to branch based on the variant.

The debug_asserts are there for fuel-merkle developers who might modify the update/delete algorithms, not for consumers of the fuel-merkle crate. But if you prefer, I can mark these functions as pub(crate) as well to make that explicit.

I also disagree that these functions are unsafe, because it is always "safe" to call them, even in the wrong context. You cannot create an unsafe/incoherent state using these functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are used only within the crate, let's mark them pub(crate) to avoid confusion in the future=)

They are unsafe from a business perspective. I only brought attention to it because this commit shows that it is not always true. And from the API perspective, it would be right to have a separate bunch of functions only for internal MerkleTree work. And pub or pub(crate) functions that may fail outside of the MerkleTree implementation.

The unsafe for internal functions will show that it is dangerous to use. But the comment may describe why it is safe. Or we can always return an option and write expect and the reason why it is expected to be valid.

Copy link
Contributor Author

@bvrooman bvrooman Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are used only within the crate, let's mark them pub(crate) to avoid confusion in the future=)

We can also use pub(in crate::sparse) to restrict this scope further. We could even rearrange the file structure to put Node under the merkle_tree module and do pub(in crate::sparse::merkle_tree) or really just pub(super) to make this very explicit. Any part of the interface marked pub is therefore also pub(in crate::sparse::merkle_tree) because of Rust's visibility rules.

They are unsafe from a business perspective. I only brought attention to it because this commit shows that it is not always true. And from the API perspective, it would be right to have a separate bunch of functions only for internal MerkleTree work. And pub or pub(crate) functions that may fail outside of the MerkleTree implementation.

I agree that we should differentiate between external and internal APIs, and that they can have different safety guarantees. At the moment, there is really only an internal API for Node. Putting Node in Proof violated that agreement by putting Node in user-space and mixing business logic with internal logic. That was bad! Node should only be concerned with internal logic, never business logic. Enforcing a more narrow visibility on Node will help reduce errors like that in the future.

To summarize: The interface was unsafe from a business perspective, but it was wrong to use it in business logic. I think isolating Node to internal logic is the first fix. If you still think unsafe should be used in this context, then I will add unsafe here.

Copy link
Contributor Author

@bvrooman bvrooman Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at #696. I think this is a bit nicer: Here, Node is completely restricted to the merkle_tree module using pub(super), making it very explicit that the only consumer is merkle_tree, and that there are no consumers external to merkle_tree. This means that the API is strictly internal, and we can always analyze it from an internal perspective.

// divergence equates to this hash.
//
let leaf = actual_leaf.clone().into();
let exclusion_proof = ExclusionProof { proof_set, leaf };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea of returning the key and value in the case of the leaf. You can return an enum, which is either a placeholder or (key, value). And in the proof function, we can always create a leaf node.

}

#[derive(Clone, Eq, PartialEq)]
pub struct ExclusionLeaf {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be an enum

}

fn is_placeholder(&self) -> bool {
self.leaf_value() == zero_sum()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it before, user may use any value, even zero sum. We need to use a separate variant for the case of the placeholder

Copy link
Contributor Author

@bvrooman bvrooman Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in this context, we are looking at the "exclusion leaf" (the nearest included leaf to the excluded key). The value of the exclusion leaf is either the hash of the user data for this leaf (not the user data itself), or placeholder hash. This is how this version allows us to use the zero sum for user data.

But, either way, we can use an enum here to simplify this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that, at this point, we have a hash of the user's value. But if the hash is zero, we will treat it as a placeholder, while it is not true. Enum fixes that case.

Comment on lines 136 to 139
if !leaf.is_placeholder() && *leaf.leaf_key() == key.as_ref() {
return false;
}
let mut current = leaf.hash();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to process enum here. If the leaf is a (key, value), then we want to use Node::create_leaf(key, value). Otherwise we want to use Node::create_placeholder().

After you can call node.hash().

bvrooman and others added 3 commits March 11, 2024 12:06
* Refactor

* Refactor

* Minor

* Fix import

* Revert file move

* Fixes + comments

* Use core deref instead of std

* Group public and private functions
xgreenx
xgreenx previously approved these changes Mar 11, 2024
@@ -58,9 +58,15 @@ pub struct InclusionProof {
impl InclusionProof {
pub fn verify(&self, root: &Bytes32, key: &MerkleTreeKey, value: &[u8]) -> bool {
let Self { proof_set } = self;

if proof_set.len() > u32::MAX as usize {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! We can actually make this

if proof_set.len() > 256 {

since a valid proof set cannot have more than log(n) side nodes.

Merged via the queue into master with commit 60e7483 Mar 12, 2024
38 checks passed
@bvrooman bvrooman deleted the bvrooman/feat/smt-proofs branch March 12, 2024 06:52
@xgreenx xgreenx mentioned this pull request Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support SMT proofs generation
3 participants