[Question] How does ReLU work in the new NN example #809

vincehong · 2024-07-24T14:34:51Z

Congratulations on your new results in https://www.zama.ai/post/making-fhe-faster-for-ml-beating-our-previous-paper-benchmarks-with-concrete-ml ! We wonder if more details about the underlying improvements could be described?

For example, the printed number of PBS in NN-20 in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb is 2440=784+18x92, this means only one PBS is required to implement a ReLU, which is counter-intuitive: Since the model is quantized to 6 bits, the result W*X would be around 14bits, how can we use one PBS to do ReLU on 14bits inputs?

Thanks.

andrei-stoian-zama · 2024-07-24T14:40:35Z

thanks!

The underlying representation used by Concrete is variable sized integers - the message space can contain up to 20-30bits. However, the PBS will only work efficiently for integers up to 6-8 bits (it can go up to 16 but it's slower). It is possible, through what we call "approximate rounding" to only apply PBS to the desired number of MSBs of a high bitwidth accumulator (the 6 msbs of the 14 bits accumulator).

A PBS refreshes noise but also applies a table lookup to the value it processes. Thus, when applying PBS we get the RELU evaluation for free.

Using only the MSBs of the accumulator works well because of quantization: quantizing a value implies dividing it by a scale factor. This division can be thought of as dividing by a power of two and by another smaller scale factor. Dividing by a power of two is actually removing LSBs.

vincehong · 2024-07-24T14:56:08Z

Thanks for the fast reply!

Dividing by a power of two is actually removing LSBs.

But removing LSB will also cost some PBS?

andrei-stoian-zama · 2024-07-24T15:05:42Z

There are two approaches to removing LSBs:

exact rounding: will use as many 1-b PBS as LSBs you want to remove
approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS

vincehong · 2024-07-24T15:26:13Z

There are two approaches to removing LSBs:

exact rounding: will use as many 1-b PBS as LSBs you want to remove

approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS

Ah that's the point, thanks! Have you tested the impact of such approximation errors? Since the final FHE accuracy 95.8% is only evaluated in fhe="simulate".

andrei-stoian-zama · 2024-07-24T15:38:23Z

FHE simulation takes into account any impact of the noise so you can be confident that it will represent FHE accuracy well. We also ran 100 samples with FHE to be sure, the accuracy was preserved.

vincehong · 2024-08-23T06:23:34Z

I change the following line
simulate_predictions = q_module.forward(data, fhe="simulate")
into
simulate_predictions = q_module.forward(data, fhe="execute")
in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb

The result is as follows:

Running NN-20 on a 128-core machine machine:Accuracy in fp32 : 98.067% for the test set
Accuracy with FHE-simulation mode : 94.241% for the test set
FHE Latency on encrypted data : 2.197s per encrypted sample.
Number of PBS: 2440
Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set
Accuracy with FHE-simulation mode : 91.336% for the test set
FHE Latency on encrypted data : 5.574s per encrypted sample.
Number of PBS: 5200

So I am wondering:

The results will be different for simulate mode and execute mode.
Using approximate PBS in ReLU will cause a non-negligible effect on the accuracy.

Update:

I also give the unmodified simulated mode for reference:
simulate_predictions = q_module.forward(data, fhe="simulate")
The results are:

Accuracy with FHE-simulation mode : 96.244% for the test set
FHE Latency on encrypted data : 6.562s per encrypted sample.
Number of PBS: 2440
Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set
Accuracy with FHE-simulation mode : 95.032% for the test set
FHE Latency on encrypted data : 15.127s per encrypted sample.
Number of PBS: 5200

vincehong · 2024-10-11T10:53:08Z

Saw this issue solved in 1.7.0. Now simulated mode correctly reflects the accuracy loss. Thanks !

bcm-at-zama · 2024-10-11T11:55:59Z

Great to see that it has been fixed with the new Concrete ML release. Indeed, in Concrete ML 1.6, we had identified an issue with approximate-mode simulation, which is fixed in 1.7, and you've been able to check it.

If you see another accuracy difference between simulation and real-FHE, please report and we investigate: it's not supposed to happen and will be treated as a bug

vincehong closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How does ReLU work in the new NN example #809

[Question] How does ReLU work in the new NN example #809

vincehong commented Jul 24, 2024 •

edited

Loading

andrei-stoian-zama commented Jul 24, 2024

vincehong commented Jul 24, 2024

andrei-stoian-zama commented Jul 24, 2024

vincehong commented Jul 24, 2024 •

edited

Loading

andrei-stoian-zama commented Jul 24, 2024

vincehong commented Aug 23, 2024 •

edited

Loading

vincehong commented Oct 11, 2024

bcm-at-zama commented Oct 11, 2024

[Question] How does ReLU work in the new NN example #809

[Question] How does ReLU work in the new NN example #809

Comments

vincehong commented Jul 24, 2024 • edited Loading

andrei-stoian-zama commented Jul 24, 2024

vincehong commented Jul 24, 2024

andrei-stoian-zama commented Jul 24, 2024

vincehong commented Jul 24, 2024 • edited Loading

andrei-stoian-zama commented Jul 24, 2024

vincehong commented Aug 23, 2024 • edited Loading

vincehong commented Oct 11, 2024

bcm-at-zama commented Oct 11, 2024

vincehong commented Jul 24, 2024 •

edited

Loading

vincehong commented Jul 24, 2024 •

edited

Loading

vincehong commented Aug 23, 2024 •

edited

Loading