Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement labels for examples #1066

Closed
tiziano88 opened this issue Jun 2, 2020 · 31 comments
Closed

Implement labels for examples #1066

tiziano88 opened this issue Jun 2, 2020 · 31 comments
Assignees
Labels
Milestone

Comments

@tiziano88
Copy link
Collaborator

Similar to #972, but for the rest of the examples (which presumably will need simpler labels).

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 29, 2020

Are hash-based labels already supported in Oak?

@conradgrobler
Copy link
Collaborator

Regarding the use of signature labels in the TIR example, I don't think that hash-based labels or signature-based labels are appropriate for the TIR example, as it should use per-user labels to be secure.

If hash- or signature-based labels are used, the output from the node is public once the data leaves the node. If an untrusted node receives the output (rather than the gRPC server pseudo node) then the response can be sent anywhere, not only to the user who requested it.

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 25, 2020

But if we will use per-user labels, will Oak be able to send requests to an external database?
The idea behind a signature label is that user trusts (based on signature) that this module will anonymize his data, before sending requests to the database.

@conradgrobler
Copy link
Collaborator

Sending a query to an external database is not appropriate for private information retrieval either. The typical use case for private/trusted information retrieval is to not leak the sensitive information that is required to do the query. If the sensitive information that should be protected is removed by the node during declassification, the query can no longer be based on this sensitive data.

My understanding was that the external database would be sent into the node (perhaps in batches if it is too big to be send all at once) that has the user data, rather than the node making a query.

@tiziano88
Copy link
Collaborator Author

I think I agree with @conradgrobler .

@ipetr0v how do you think using a signature-based label helps with this example in particular? Perhaps could you describe the workflow you have in mind including who signs what, and why they should be trusted?

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 25, 2020

Currently we have two modules in TIR:

  • the TIR itself
    • It processes specific user requests
  • and a database_proxy
    • Is a generic implementation of private information retrieval, that requests a specific database entry E without revealing it

Client sends requests for a specific database entry E to TIR, but client doesn't trust TIR module, because it's just a specific implementation created by a third-party. Client trusts the database_proxy (ex. that was signed by Google) to request E from a database. Thus, client assigns a Google's public key to its data, because only database_proxy is trusted to know E and request it from an external database without revealing it.

And database_proxy was created by Google in order for it to be used by multiple third-party applications, that want to benefit from private information retrieval.

@conradgrobler
Copy link
Collaborator

What labels are associated with TIR and database_proxy? Where does the output from database_proxy go?

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 25, 2020

There are no labels for TIR and a public_key label for database_proxy.

database_proxy works as follows:

  • Some other module requests E from it
  • database_proxy sends multiple requests to an external database and iteratively searches for an entry corresponding to E
  • Then it returns the found entry to the module that asked for it

@conradgrobler
Copy link
Collaborator

  • Some other module requests E from it

In our case, is it TIR that requests it? If TIR has no labels, what is stopping it from leaking E or any other data received from the user?

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 25, 2020

The idea behind database_proxy was to create a separate generic module, that doesn't depend on specific user requests and just provides a string id lookup in database (TIR just parses the response and sends it back to user).

In our case, is it TIR that requests it? If TIR has no labels, what is stopping it from leaking E or any other data received from the user?

I think once #1357 is implemented we would be able to add user token labels to a request, but it will also create the following problem:

  • Since database cannot fit into Oak, it needs to iteratively request chunks of data from the database
  • On each iteration it needs to search for E in it, and thus the module (that sends requests outside of Oak) needs to know E
  • But if we will add user token labels to E - the module will only be able to send data to the user and not to the database

I think in our case user needs to trust database_proxy's public key, and probably TIR's hash.
So the data label should be changed to hash(TIR) ∧ public_key(database_proxy)
It would mean that the user trusts a specific version of TIR, and it also can use any version of the database proxy signed by Google.

And TIR also will need to declassify the data (but I don't know if declassification is implemented already).

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 25, 2020

In our case, is it TIR that requests it? If TIR has no labels, what is stopping it from leaking E or any other data received from the user?

Also, data is labeled with the public key, so IIUC only modules labeled with it can declassify this data.
And thus, only database_proxy will be able to declassify it.
(this is the example with public_key(database_proxy) label)

@conradgrobler
Copy link
Collaborator

There are two things that are unclear to me:

  • If TIR has no label it is public, so it cannot see data that is labeled with public_key(database_proxy). How does the data flow in the example?
  • Once database_proxy has declassified the results, the results are public. What stops other nodes from sending these public results anywhere else?

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 26, 2020

If TIR has no label it is public, so it cannot see data that is labeled with public_key(database_proxy). How does the data flow in the example?

I think it can, it just cannot send it outside of Oak.

Once database_proxy has declassified the results, the results are public. What stops other nodes from sending these public results anywhere else?

In this case, it looks like we need to trust both modules (TIR and proxy) and use hash(TIR) ∧ public_key(database_proxy) label. Because we cannot label data with user tokens, since proxy will not be able to declassify requests.

@tiziano88
Copy link
Collaborator Author

Let us start with a simpler case, in which the entire database is already in-memory in the node (no lookups to an external server).

@ipetr0v in this case, what is the label of the incoming data, and of the various nodes?

@conradgrobler
Copy link
Collaborator

conradgrobler commented Aug 26, 2020

I think it can, it just cannot send it outside of Oak.

IFC will stop data with a non-public label from being read by a node with a public label. A node with a public label can always send data outside of Oak.

In this case, it looks like we need to trust both modules (TIR and proxy).

This just moves the problem elsewhere. A malicious application owner could add another public untrusted module that then receives the declassified output from TIR and sends it elsewhere. The user connecting to the application has no way of knowing whether the output from TIR goes back to the gRPC server, or to some other node.

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 26, 2020

in this case, what is the label of the incoming data, and of the various nodes?

Currently it's just public. I started to add a public key label for database_proxy, but it's not a final IFC to TIR, it's just an example of using public key labels.
But in the case when the whole database can fit in Oak, we don't need public key labels - we just need a user token label (because database is already there, and there is no need to make external requests).

Problems arise when we cannot fit the whole database (need to make requests) and also want to prohibit the result from being sent somewhere else.

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 26, 2020

Also, the problem that @conradgrobler is describing probably may arise in any application, where developers try to use third-party modules for routines involving declassification.

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 26, 2020

I think one possible solution to this problem will be to use disjunctions (#1207) in the label.

So the user assigned label would be token(user) ∨ public_key(database_proxy). This would allow database_proxy to declassify requests and also will prohibit any other node from sending data not back to the original user.

@tiziano88
Copy link
Collaborator Author

I am not sure disjunctions would help, a disjunction used in a confidentiality label is strictly weaker than either of the original principals, right?

@ipetr0v
Copy link
Contributor

ipetr0v commented Aug 26, 2020

But it would mean that either database_proxy or gRPC server/client sending data to client can declassify data, and no one else.

This was referenced Aug 28, 2020
ipetr0v added a commit that referenced this issue Sep 10, 2020
This change adds a signature label to Private Set Intersection example.

Fixes #1344
Ref #1066
@conradgrobler
Copy link
Collaborator

What labels should we use with the other examples?

All of the label types are a bit inconvenient to maintain:

  • Wasm hash labels mean that the hash must be updated in multiple places in code every time the code is recompiled (e.g. when the SDK changes) and that the updated binary must be pushed to storage
  • Signature labels mean that the binary must be pushed to storage and the signature must be updates when the code is recompiled
  • Per-user labels mean that we need to have a router node that creates a new set of nodes per user

I think that per-user labels would require the least amount of ongoing maintenance, even though it requires more upfront work.

@tiziano88
Copy link
Collaborator Author

What labels should we use with the other examples?

All of the label types are a bit inconvenient to maintain:

  • Wasm hash labels mean that the hash must be updated in multiple places in code every time the code is recompiled (e.g. when the SDK changes) and that the updated binary must be pushed to storage
  • Signature labels mean that the binary must be pushed to storage and the signature must be updates when the code is recompiled
  • Per-user labels mean that we need to have a router node that creates a new set of nodes per user

I think that per-user labels would require the least amount of ongoing maintenance, even though it requires more upfront work.

These are not interchangeable options, they have different meaning and security properties, we should not choose between them from a convenience point of view, rather based on what makes sense for the individual examples.

In particular, I think:

  • I expect Wasm hash labels will rarely be used in practice, unless there is a unanimous agreement that, e.g. a specific piece of functionality is implemented by a module with a specific Hash module. Even if / when this is the case, it is still more likely that such agreements will be published in the form of a signature over the hash itself, so a signature label probably make sense for that use case.
  • Signature labels are used in case in which we expect Wasm code to declassify data, which would happen only for self-contained pieces of logic that are manually reviewed, and these reviewes published by the appropriate verifiers (alongside their public keys). From the current set of examples, I think this only applies to:
  • user labels may be used by anything that operates on per-user data, since these may only be declassified by the gRPC / HTTP server nodes, which are already trusted. Most applications would just rely on these, even if it means that we need to implement the router node pattern in more applications (which BTW it is mostly done already).

@tiziano88
Copy link
Collaborator Author

@ipetr0v I think you are already looking into the remaining labels for some of the examples, so assigning this to you. There may be additional ones for the chat examples to be assigned once #1452 is fixed, which I can help looking into more closely, when the time comes.

@ipetr0v
Copy link
Contributor

ipetr0v commented Nov 18, 2020

Since we probably will implement Router pattern in every module that uses labels - we also will need to split those modules, so that Router will not get declassification privileges.

@tiziano88
Copy link
Collaborator Author

main and router can be in the same module. The module that does declassification will be in a separate one. I think this should be enough, and makes sense in terms of reusability.

@ipetr0v
Copy link
Contributor

ipetr0v commented Nov 18, 2020

I also think that it makes sense to use Certificate labels for simple examples - since it's easier to maintain and it also captures the notion of updatable modules better.

@ipetr0v
Copy link
Contributor

ipetr0v commented Nov 19, 2020

Also trusted_database as well as a lot of other examples require client public key labels?
IIUC right now they are only implemented for HTTP server, right?

@tiziano88
Copy link
Collaborator Author

Correct, @rbehjati is adding support to gRPC

@rbehjati
Copy link
Contributor

Yes. I'll add the authentication to the trusted DB example as @ipetr0v suggested, and I'll be done with the PR (#1707).

ipetr0v added a commit that referenced this issue Nov 20, 2020
This change updates Trusted Database example to use Router pattern.

Ref #1066
@ipetr0v
Copy link
Contributor

ipetr0v commented Nov 20, 2020

I think, for hello_world and translator examples, it makes sense to create individual nodes per request and assign them client related labels (similar to how trusted_database works).
Because these examples do not perform declassification and don't require either hash or signature labels.

@ipetr0v
Copy link
Contributor

ipetr0v commented Jan 25, 2021

I think the following commit (29822e2) was the last one for this issue.

@ipetr0v ipetr0v closed this as completed Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants