Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How does ReLU work in the new NN example #809

Closed
vincehong opened this issue Jul 24, 2024 · 8 comments
Closed

[Question] How does ReLU work in the new NN example #809

vincehong opened this issue Jul 24, 2024 · 8 comments

Comments

@vincehong
Copy link

vincehong commented Jul 24, 2024

Congratulations on your new results in https://www.zama.ai/post/making-fhe-faster-for-ml-beating-our-previous-paper-benchmarks-with-concrete-ml ! We wonder if more details about the underlying improvements could be described?

For example, the printed number of PBS in NN-20 in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb is 2440=784+18x92, this means only one PBS is required to implement a ReLU, which is counter-intuitive: Since the model is quantized to 6 bits, the result W*X would be around 14bits, how can we use one PBS to do ReLU on 14bits inputs?

Thanks.

@andrei-stoian-zama
Copy link
Collaborator

thanks!

The underlying representation used by Concrete is variable sized integers - the message space can contain up to 20-30bits. However, the PBS will only work efficiently for integers up to 6-8 bits (it can go up to 16 but it's slower). It is possible, through what we call "approximate rounding" to only apply PBS to the desired number of MSBs of a high bitwidth accumulator (the 6 msbs of the 14 bits accumulator).

A PBS refreshes noise but also applies a table lookup to the value it processes. Thus, when applying PBS we get the RELU evaluation for free.

Using only the MSBs of the accumulator works well because of quantization: quantizing a value implies dividing it by a scale factor. This division can be thought of as dividing by a power of two and by another smaller scale factor. Dividing by a power of two is actually removing LSBs.

@vincehong
Copy link
Author

Thanks for the fast reply!

Dividing by a power of two is actually removing LSBs.

But removing LSB will also cost some PBS?

@andrei-stoian-zama
Copy link
Collaborator

There are two approaches to removing LSBs:

  • exact rounding: will use as many 1-b PBS as LSBs you want to remove
  • approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS

@vincehong
Copy link
Author

vincehong commented Jul 24, 2024

There are two approaches to removing LSBs:

  • exact rounding: will use as many 1-b PBS as LSBs you want to remove
  • approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS

Ah that's the point, thanks! Have you tested the impact of such approximation errors? Since the final FHE accuracy 95.8% is only evaluated in fhe="simulate".

@andrei-stoian-zama
Copy link
Collaborator

FHE simulation takes into account any impact of the noise so you can be confident that it will represent FHE accuracy well. We also ran 100 samples with FHE to be sure, the accuracy was preserved.

@vincehong
Copy link
Author

vincehong commented Aug 23, 2024

I change the following line
simulate_predictions = q_module.forward(data, fhe="simulate")
into
simulate_predictions = q_module.forward(data, fhe="execute")
in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb

The result is as follows:

Running NN-20 on a 128-core machine machine:Accuracy in fp32 : 98.067% for the test set
Accuracy with FHE-simulation mode : 94.241% for the test set
FHE Latency on encrypted data : 2.197s per encrypted sample.
Number of PBS: 2440
Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set
Accuracy with FHE-simulation mode : 91.336% for the test set
FHE Latency on encrypted data : 5.574s per encrypted sample.
Number of PBS: 5200

So I am wondering:

  1. The results will be different for simulate mode and execute mode.
  2. Using approximate PBS in ReLU will cause a non-negligible effect on the accuracy.

Update:

I also give the unmodified simulated mode for reference:
simulate_predictions = q_module.forward(data, fhe="simulate")
The results are:

Accuracy with FHE-simulation mode : 96.244% for the test set
FHE Latency on encrypted data : 6.562s per encrypted sample.
Number of PBS: 2440
Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set
Accuracy with FHE-simulation mode : 95.032% for the test set
FHE Latency on encrypted data : 15.127s per encrypted sample.
Number of PBS: 5200

@vincehong
Copy link
Author

Saw this issue solved in 1.7.0. Now simulated mode correctly reflects the accuracy loss. Thanks !

@bcm-at-zama
Copy link
Collaborator

Great to see that it has been fixed with the new Concrete ML release. Indeed, in Concrete ML 1.6, we had identified an issue with approximate-mode simulation, which is fixed in 1.7, and you've been able to check it.

If you see another accuracy difference between simulation and real-FHE, please report and we investigate: it's not supposed to happen and will be treated as a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants