-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/add automatic rounding #558
Conversation
0e9f3a9
to
5825170
Compare
deba506
to
2700f25
Compare
Makefile
Outdated
@@ -54,7 +54,6 @@ setup_env: | |||
echo "Finished installing poetry lock." | |||
|
|||
echo "Installing $(CONCRETE_PYTHON_VERSION)" && \ | |||
poetry run python -m pip install -U --pre "$(CONCRETE_PYTHON_VERSION)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why this ishere
conftest.py
Outdated
f"# {randomly_seed} # {str(request.node.own_markers)}" | ||
) | ||
|
||
print(f"{derivation_string=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be removed
abafd1d
to
dd3f088
Compare
e6e86c6
to
7c05e16
Compare
Coverage failed ❌Coverage details
|
Makefile
Outdated
@@ -9,6 +9,7 @@ SRC_DIR:=src | |||
TEST?=tests | |||
N_CPU?=4 | |||
CONCRETE_PACKAGE_PATH=$(SRC_DIR)/concrete | |||
SPHINX_APIDOC_EXCLUDE=$(SRC_DIR)/concrete/ml/common/preprocessors.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this exactly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason sphinx was crashing for this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can now be removed since we don't have sphinx anymore
|
||
for node in sorted_nodes: | ||
if node.operation == Operation.Input: | ||
# Deepcopy on the fhe.Graph doesn't modify the input node/indices mappings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this comment is about
} | ||
attributes = {} | ||
if rounding_function.__name__ == "round_bit_pattern": | ||
# These kwargs are not supported atm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again what is this comment ? should it require a fixme ?
rounding_node.properties["overflow_protection"] = overflow_protection | ||
rounding_node.properties["exactness"] = exactness | ||
|
||
rounding_node.bounds = a_node.bounds # Might be over/under-estimated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, should this comment be removed ?
@@ -0,0 +1,999 @@ | |||
"""Graph pre-processors for automatic rounding.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good but :
- very few comments in the actual code, which makes the whole file hard to follow (cannot imagine debugging this in the future)
- some remaining comments that are confusing / seem irrelevant
|
||
|
||
class TorchAutoRoundingTLUTester(nn.Module): | ||
"""A small quantized network with Brevitas, trained on make_classification.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad copy paste ?
@@ -236,116 +236,7 @@ def q_impl( | |||
|
|||
p = input2_q_values.shape[-2] | |||
|
|||
# Remove the manual matrix multiplication when we can handle input precision with rounding | |||
# FIXME: https://github.com/zama-ai/concrete-internal/issues/512 | |||
def enc_mul(x, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this expected ? we no longer need it ?
reference = f(input_set) | ||
|
||
if function_name in ["staircase_pot", "staircase"] and execution_number == 0: | ||
# TODO: round to 1b or eliminate these TLUs entirely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a fixme here ?
tests/common/test_preprocessors.py
Outdated
[circuit_no_optim_no_rounding.simulate(numpy.array([elt])) for elt in input_set] | ||
)[..., 0] | ||
|
||
graph_res = numpy.array([circuit.graph(numpy.array([elt])) for elt in input_set])[..., 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need to test the graph (old VL) ?
@@ -0,0 +1,198 @@ | |||
"""Unit tests for TLU optimization preprocessors.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit like above, looks great overall but very few (random) comments that makes the file hard to understand (several times I have no clue what is actually going on)
|
||
count_reinterpret = 0 | ||
count_tlu = 0 | ||
for line in model.quantized_module_.fhe_circuit.mlir.split("\n"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a function ? either in source (pytest) or in this file, but looks important
"max_epochs": 10, | ||
} | ||
|
||
model = NeuralNetClassifier(**params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small detail but you could update instantiate_model_generic
to be able to provide custom params and then use preamble
, makes test easier to control
CURRENT_DIR = Path(__file__).resolve().parent | ||
KEYGEN_CACHE_DIR = CURRENT_DIR.joinpath(".keycache") | ||
|
||
# Add MPS (for macOS with Apple Silicon or AMD GPUs) support when error is fixed. For now, we | ||
# observe a decrease in torch's top1 accuracy when using MPS devices | ||
# FIXME: https://github.com/zama-ai/concrete-ml-internal/issues/3953 | ||
DEVICE = "cuda" if torch.cuda.is_available() else "cpu" | ||
NUM_SAMPLES = int(os.environ.get("NUM_SAMPLES", 1)) | ||
|
||
NUM_SAMPLES = int(os.environ.get("NUM_SAMPLES", 1000 if SIMULATE_ONLY else 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000 samples, even with simulate, looks a bit too much no ? why not keep 100 ?
from concrete.ml.quantization import QuantizedModule | ||
from concrete.ml.torch.compile import compile_brevitas_qat_model | ||
|
||
SIMULATE_ONLY = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this an env var ?
@@ -76,11 +82,20 @@ def wrapper(*args, **kwargs): | |||
# cache generated keys through `insecure_key_cache_location`. As the name suggests, these | |||
# parameters are unsafe and should only be used for debugging in development | |||
# Multi-parameter strategy is used in order to speed-up the FHE executions | |||
base_configuration = Configuration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be removed ?
import sys | ||
|
||
if sys.platform == "darwin": | ||
print("skipping fhe evaluation on darwin platform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this ! overall it looks great, but we might want to clean a bit some stuff at one point: important files/code are missing soem comments to better explain what is going on (some parts are pretty obscure), several now-irrelevant comments to remove, some things could be simplified in tests
7851e27
to
a419afb
Compare
assert isinstance(evaluator, GenericEvaluator) | ||
|
||
if "subgraph" not in evaluator.properties["kwargs"]: | ||
# Not supported for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this case arise ? is it for fhe.univariate use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes univariate or explicit table use in Concrete.
Anyway it's a good check since in the following we try to insert nodes in the subgraph
Makefile
Outdated
@@ -9,6 +9,7 @@ SRC_DIR:=src | |||
TEST?=tests | |||
N_CPU?=4 | |||
CONCRETE_PACKAGE_PATH=$(SRC_DIR)/concrete | |||
SPHINX_APIDOC_EXCLUDE=$(SRC_DIR)/concrete/ml/common/preprocessors.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can now be removed since we don't have sphinx anymore
Makefile
Outdated
@@ -415,7 +416,7 @@ docs_no_links: clean_docs check_docs_dollars | |||
mkdir -p docs/_static/ | |||
@# Generate the auto summary of documentations | |||
@# Cannot do without specifying top module currently with sphinx-apidoc | |||
poetry run sphinx-apidoc --implicit-namespaces -o docs/_apidoc $(CONCRETE_PACKAGE_PATH) | |||
poetry run sphinx-apidoc --implicit-namespaces -o docs/_apidoc $(CONCRETE_PACKAGE_PATH) $(SPHINX_APIDOC_EXCLUDE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
configuration = Configuration( | ||
dump_artifacts_on_unexpected_failures=False, | ||
enable_unsafe_features=True, | ||
use_insecure_key_cache=True, | ||
insecure_key_cache_location=KEYGEN_CACHE_DIR, | ||
additional_pre_processors=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? we should implement compile_torch_model
with rounding_threshold_bits: { n_bits=AUTO, method = approximate }
cfg = Configuration( | ||
verbose=True, | ||
show_optimizer=args.show_optimizer, | ||
additional_pre_processors=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment about the right API
fix: update to the new VL chore: do not test p_error on linear models chore: remove custom encrypted matmul chore: add simulation compilation chore: skip low bit-width rounding in tests chore: update concrete-python nightly chore: fix coverage chore: default to n_jobs = 1 for the CI chore: fix forbidden words feat: approximate rounding by default chore: remove cp install chore: update licenses for macos-latest-xl chore: experiment with 3-bit rounding exp: debugging Required to modified CP to set preprocessors before the other ones but it's compiling now. wip, it breaks exp: debugging exp: debugging exp: debugging exp: debugging exp: debugging exp: debugging exp: debugging exp: debugging exp: debugging exp: debugging feat: add rounding torch models tests chore: add crypto-param tests feat: more tests fix: reinstate ceiling bitwidth computation chore: refactor chore: fixing tests, some weird behaviors fix: add test and continue debugging debugging debugging debugging exp: debugging exp: debugging exp: debugging feat: add more tlu optimization tests exp: debugging feat: rework preprocessor fix: rework tlu optimization feat: fix automatic rounding bugs fix: remove all debugging in tests fix: make conformance fix: fix pylint and mypy fix: remove debug files fix: docstrings fix: pcc fix: cp installation fix: revert deps to defaults for new cp fix: revert deps to defaults for new cp fix: pylint and docs fix: tests fix: tests 2
fix: bad syntax fix: handle all shapes as input to subgraph
Tests are passing with a flaky. Not sure the works does what it's supposed to do fully. Need at least a re-check and probably some comments.
Still experimenting with it. In the current setting using our approach instead of the 6-bit rounding everywhere approach we degrade the top-1 accuracy of the CIFAR model too much (from 0.88 with the torch model, to 0.86 with rounding 6-bit to 0.79 for auto-rounding). I added a notebook to visualize what happens with the auto-rounding optimization on a simple setting. Things look fishy to me. Let's keep iterating on it until we find an appropriate setting that works on the CIFAR-10 use-case and at least another use-case.
Still a work in progress. I added a small script to help debug the pre-processor. A few things remain: - I often have a right or left offset of 1 on the steps functions - Single jump TLUs are still buggy due to the right value being taken not being the expected one - Some situations were Concrete does not agree with Numpy - On CIFAR some TLUs have one more bit than expected.
All tests passed once. Main improvement was a change on the closed form used for the bias. There is probably still something to improve there. I also added clipping after down-scaling in the LUT and a check on delta to see if something could be gained. My guess is that there is still something to gain in this specific situtation if the ceil(log2(x_max-x_min)) < bit_width([x_max, x_min]). In this specific situation we could indeed just center everything and scale to match this observed bit-width.
uint bounds is still not supported, there is something wrong with how we handle it for now. Instead of it just not giving results I found it's best to just raise an error when it's encountered. One bug was detected on CIFAR but another remains. To be followed-up.
f74557a
to
943d66a
Compare
Should we maybe close this one for now @andrei-stoian-zama ? |
Despite many efforts, the the results were not good enough and rounding to 6 is the better approach |
Adds automatic rounding mechanism.