Witgen for public references #1756

georgwiese · 2024-09-05T12:58:06Z

Step towards #1633

This PR adds witness generation for any public that is referenced from an identity.

Note that publics and public references are now existing independently:

A public is still defined as a pointer to a cell in the trace. The prover extracts the values from the trace and returns them to the verifier; witgen has nothing to do with them (except providing the values in the trace).
A public reference (i.e., a public that is referenced by a constraint) was previously unimplemented. Now, witgen would solve for this value. This value might not be the same as the value of the public being referenced! We don't check for consistency.

After #1633 is completed, publics will no longer be defined in terms of trace cells, so the values returned by witgen will be the ones that are returned to the verifier.

For now, the values are not returned yet (and different machines might find conflicting values for the same public). But the solving works, and I added a log message, e.g.:

$ cargo run pil test_data/pil/fibonacci_with_public.pil -o output -f
Writing output/fibonacci_with_public_analyzed.pil.
done.
Optimizing pil...
Removed 0 witness and 0 fixed columns. Total count now: 2 witness and 1 fixed columns.
Writing output/fibonacci_with_public_opt.pil.
Evaluating fixed columns...
Fixed column generation took 0.001645084s
Writing output/constants.bin.
Deducing witness columns...
Running main machine for 4 rows
[00:00:00 (ETA: 00:00:00)] ░░░░░░░░░░░░░░░░░░░░ 0% - Starting...                                                   
        => out (public) = 5
[00:00:00 (ETA: 00:00:00)] ████████████████████ 100% - Starting...                                                   
Witness generation took 0.00259025s
Writing output/commits.bin.

(First 4 commits of #1650) This PR prepares witness generation for scalar publics (#1756). Scalar publics are similar to cells in the trace, but are global (i.e., independent on the row number). With this PR, affine expressions use a new `AlgebraicVariable` enum, that can be either a column reference (`&'a AlgebraicReference`, which was used previously), or a reference to a public.

…nges2

Schaeff

Some comments but I need to have a closer look

Schaeff · 2024-09-26T11:56:56Z

executor/src/witgen/generator.rs


        let eval_value = if eval_value.is_complete() {
            log::trace!("End processing VM '{}' (successfully)", self.name());
            // Remove the last row of the previous block, as it is the first row of the current
            // block.
            self.data.pop();
            self.data.extend(block);
+            self.publics.extend(publics);


Can a public value be silently replaced here?

No, because setting a value for a public that is already in the map is a hard failure.

Just as long as the set of publics is all publics.

What do you mean?

Schaeff · 2024-09-26T11:58:16Z

executor/src/witgen/generator.rs

@@ -181,7 +195,9 @@ impl<'a, T: FieldElement> Generator<'a, T> {
        );
        processor.solve(&mut sequence_iterator).unwrap();

-        processor.finish().remove(1)
+        // Ignore any updates to the publics at this point, as we'll re-visit the last row again.


Does this assume that publics are only present in the last row?

No. This piece of code is just to initialize the first row from identities like pc' = (1 - first_step') * <...>.

Since this is the very first thing that happens, we won't have any publics at that point. We could in theory derive a public value here (e.g. if there was a constraint like first_step' * (foo - :final_foo) = 0, but we'll process the last row again anyway, so it should be OK to discard here.

executor/src/witgen/machines/block_machine.rs

executor/src/witgen/vm_processor.rs

chriseth · 2024-09-26T12:50:30Z

executor/src/witgen/block_processor.rs

@@ -28,12 +30,21 @@ impl<'a, 'b, 'c, T: FieldElement, Q: QueryCallback<T>> BlockProcessor<'a, 'b, 'c
    pub fn new(
        row_offset: RowIndex,
        data: FinalizableData<T>,
+        publics: BTreeMap<&'a str, T>,


Might they not be part of the machine parts?

Thinking about it: No, because these are the already known values for the publics, right?

Maybe this can be combined with the values for challenges in the future...

I grouped the trace block and publics, using the same type returned by finish().

executor/src/witgen/block_processor.rs

chriseth · 2024-09-26T12:54:42Z

executor/src/witgen/processor.rs

@@ -393,6 +401,7 @@ Known values in current row (local: {row_index}, global {global_row_index}):
            &self.data[row_index],


Can we extract the creation of a row pair into a helper function?

I think I tried that a long time ago and failed because of lifetime issues. Will try again, but in a separate PR.

Yeah, the problem is that the RowPair contains references to fields of Processor, so as long as the row pair exists, Processor cannot be borrowed mutably. If inlined, the compiler understands that only some fields are borrowed, not all of &self, but once this is in a helper function, there is no way to communicate that to the compiler.

chriseth · 2024-09-26T13:01:36Z

I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?

chriseth · 2024-09-26T13:02:15Z

If it's not machine-specific, we could have a situation where two conflicting values for a public lead to an assertion failure (in the 'apply update' function) instead of a proper error that identities are conflicting.

…nges2

georgwiese · 2024-10-01T10:07:34Z

I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?

In this current code, each machine solves for any publics that appears in its identities. They are never returned, and any conflicts (two machines determining a different value for the same public) are not detected. Also, we don't fail if a public is not assigned a value by any machine.

This PR is more an intermediate step, implementing the solving within a machine. To implement the full thing, we have to answer whether a public belongs to only one machine. All practical examples I know of would be like that, but in theory you could access publics in different machines, which would put a constraint between the two machines and make the solving pretty complicated.

If yes, how do we detect it? AFAIK, publics currently live outside any namespace. Perhaps the easiest would be to check each machine's identities and fail if the assignment is not unique.
If no, I think the easiest would be to assert that all publics have a value and fail if two machines assign a different value.

chriseth

Looks good! Can you add a test for two machines accessing the same public but assigning two different values?

georgwiese · 2024-10-04T11:25:25Z

OK! I started a executor/tests/witgen.rs to only test witgen, no proof generation.

…nges2

Schaeff

I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?

In this current code, each machine solves for any publics that appears in its identities. They are never returned, and any conflicts (two machines determining a different value for the same public) are not detected. Also, we don't fail if a public is not assigned a value by any machine.

This PR is more an intermediate step, implementing the solving within a machine. To implement the full thing, we have to answer whether a public belongs to only one machine. All practical examples I know of would be like that, but in theory you could access publics in different machines, which would put a constraint between the two machines and make the solving pretty complicated.

If yes, how do we detect it? AFAIK, publics currently live outside any namespace. Perhaps the easiest would be to check each machine's identities and fail if the assignment is not unique.

If no, I think the easiest would be to assert that all publics have a value and fail if two machines assign a different value.

I think public inputs should be inside namespaces, in the future proof objects. Until then, I agree with the first option.

chriseth · 2024-10-22T13:59:21Z

executor/Cargo.toml

@@ -25,6 +25,7 @@ indicatif = "0.17.7"
 serde = { version = "1.0", default-features = false, features = ["alloc", "derive", "rc"] }

 [dev-dependencies]
+powdr-pipeline.workspace = true


Is this avoidable?

Yes, I think I could manually do all steps that need to happen before witgen. I guess the advantage would be that we depend on fewer crates to build the tests (= not the backend crate), but it would a bit cumbersome to write and maintain.

Do you think it's worth it? If we run all tests anyway, the compilation time would be the same, no? In what scenario would it be faster?

chriseth · 2024-10-22T14:41:00Z

executor/tests/witgen.rs

+
+#[test]
+#[should_panic = "Publics are referenced by more than one machine: {\"public\"}"]
+fn two_machines_conflicting_public() {


I see that this test is supposed to test witness generation, but shouldn't we just put it into the pipeline crate if it's not a unit test? That way we avoid the dependency.

Can you elaborate which workflow would be faster then? I think it does fit will in the executor crate...

chriseth · 2024-10-22T14:43:34Z

executor/src/witgen/generator.rs


        let eval_value = if eval_value.is_complete() {
            log::trace!("End processing VM '{}' (successfully)", self.name());
            // Remove the last row of the previous block, as it is the first row of the current
            // block.
            self.data.pop();
-            self.data.extend(block);
+            self.data.extend(updated_data.block);
+            self.publics.extend(updated_data.publics);


This is so horrible for performance...
At some point we need to make a big refactor and allow direct access to all those variables from within the processors (unless they are really isolated).

In this case, this is only run at the end of a call to a secondary VM, right? Given that the list of publics will be small (assuming a succinct verifier), I'd expect this to be negligible...

chriseth · 2024-10-22T14:46:32Z

executor/src/witgen/machines/machine_extractor.rs

@@ -32,6 +30,36 @@ pub struct ExtractionOutput<'a, T: FieldElement> {
    pub base_parts: MachineParts<'a, T>,
 }

+#[derive(Default)]


Can you move this (non-pub) below (pub) split_out_machine (which is the main entry point into this file)?

chriseth · 2024-10-22T14:47:12Z

executor/src/witgen/machines/machine_extractor.rs

+
+impl<'a> PublicsTracker<'a> {
+    /// Given a machine's identities, add all publics that are referenced by them.
+    /// Panics if a public is referenced by more than one machine.


Could it return an error instead?

chriseth · 2024-10-22T14:48:50Z

executor/src/witgen/machines/machine_extractor.rs

+            .intersection(&referenced_publics)
+            .collect::<BTreeSet<_>>();
+        if !intersection.is_empty() {
+            panic!("Publics are referenced by more than one machine: {intersection:?}",);


Can you print their names?

They are, no? That's what's in intersection? (We still use names as IDs for public references...)

chriseth · 2024-10-22T14:50:17Z

executor/src/witgen/machines/machine_extractor.rs

+        &mut self,
+        identities: &[&'a powdr_ast::analyzed::Identity<SelectedExpressions<Expression<T>>>],
+    ) {
+        let referenced_publics = identities


I don't see a proper use-case, but actually we should only consider LHSs of lookups.

Ah, but all the identities here should be non-lookups, ok.

chriseth · 2024-10-22T15:10:17Z

executor/src/witgen/processor.rs

@@ -24,6 +24,27 @@ use super::{

 type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>;

+/// The data mutated by the processor


Is this distinctive enough from the MutableState?

Renamed it to SolverState, what do you think?

georgwiese

@chriseth Thanks for the review! Think I addressed your comments, except for the new test dependency. It's not clear to me what the best solution is yet. Would it be fixed if it only depended on the steps before witness generation? Then I would prefer to do that vs moving the test to the pipeline crate.

georgwiese · 2024-10-22T16:50:25Z

executor/src/witgen/generator.rs


        let eval_value = if eval_value.is_complete() {
            log::trace!("End processing VM '{}' (successfully)", self.name());
            // Remove the last row of the previous block, as it is the first row of the current
            // block.
            self.data.pop();
-            self.data.extend(block);
+            self.data.extend(updated_data.block);
+            self.publics.extend(updated_data.publics);


In this case, this is only run at the end of a call to a secondary VM, right? Given that the list of publics will be small (assuming a succinct verifier), I'd expect this to be negligible...

georgwiese · 2024-10-22T16:55:14Z

executor/src/witgen/machines/machine_extractor.rs

+            .intersection(&referenced_publics)
+            .collect::<BTreeSet<_>>();
+        if !intersection.is_empty() {
+            panic!("Publics are referenced by more than one machine: {intersection:?}",);


They are, no? That's what's in intersection? (We still use names as IDs for public references...)

georgwiese · 2024-10-22T17:05:06Z

executor/src/witgen/processor.rs

@@ -24,6 +24,27 @@ use super::{

 type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>;

+/// The data mutated by the processor


Renamed it to SolverState, what do you think?

georgwiese · 2024-10-22T17:09:01Z

executor/tests/witgen.rs

+
+#[test]
+#[should_panic = "Publics are referenced by more than one machine: {\"public\"}"]
+fn two_machines_conflicting_public() {


Can you elaborate which workflow would be faster then? I think it does fit will in the executor crate...

georgwiese changed the base branch from main to introduce-algebraic-variable September 5, 2024 12:58

georgwiese mentioned this pull request Sep 5, 2024

[WIP] Witgen for public references #1650

Closed

georgwiese force-pushed the witgen-challenges2 branch from b411b48 to 4ce4dc1 Compare September 5, 2024 14:42

georgwiese mentioned this pull request Sep 23, 2024

Introduce AlgebraicVariable #1755

Merged

georgwiese force-pushed the witgen-challenges2 branch from 4ce4dc1 to 7c4c4cf Compare September 23, 2024 09:47

Base automatically changed from introduce-algebraic-variable to main September 23, 2024 15:07

georgwiese added 3 commits September 23, 2024 18:14

Continue

a176610

Implement solving for publics

0a04573

Add test

a19e3a4

georgwiese force-pushed the witgen-challenges2 branch from 3cf5f35 to a19e3a4 Compare September 23, 2024 16:14

georgwiese added 4 commits September 26, 2024 13:04

Fix test

c484239

Merge branch 'main' of github.com:powdr-labs/powdr into witgen-challe…

4453ee1

…nges2

Undo change

0450717

Polish

87e004d

georgwiese changed the title ~~[WIP] Witgen for public references~~ Witgen for public references Sep 26, 2024

georgwiese marked this pull request as ready for review September 26, 2024 11:35

Schaeff reviewed Sep 26, 2024

View reviewed changes

chriseth reviewed Sep 26, 2024

View reviewed changes

executor/src/witgen/block_processor.rs Outdated Show resolved Hide resolved

chriseth reviewed Sep 26, 2024

View reviewed changes

georgwiese added 4 commits October 1, 2024 10:40

Merge branch 'main' of github.com:powdr-labs/powdr into witgen-challe…

e252b7f

…nges2

Introduce type for trace block + publics

85d09b8

Processor: Take mutable data as input

05375e5

Suggested comment change

2f1f10c

georgwiese requested review from Schaeff and chriseth October 1, 2024 10:07

chriseth reviewed Oct 1, 2024

View reviewed changes

georgwiese added 2 commits October 4, 2024 12:09

Add test

b3ea330

Refactor test

8838b2a

georgwiese added 3 commits October 7, 2024 10:03

Merge branch 'main' of github.com:powdr-labs/powdr into witgen-challe…

283ffed

…nges2

When assuming zero for unknown values, also return zero for publics

35398fe

Change expected error message

17ca708

georgwiese requested a review from chriseth October 7, 2024 10:47

Schaeff reviewed Oct 9, 2024

View reviewed changes

georgwiese added 3 commits October 21, 2024 19:10

Merge origin/main

fe1c574

Detect if a public is referenced by more than one machine

9447871

Format

b92b13b

chriseth reviewed Oct 22, 2024

View reviewed changes

Review feedback

7c2c8e9

georgwiese commented Oct 22, 2024

View reviewed changes

		@@ -393,6 +401,7 @@ Known values in current row (local: {row_index}, global {global_row_index}):
		&self.data[row_index],

		@@ -24,6 +24,27 @@ use super::{

		type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>;

		/// The data mutated by the processor

Witgen for public references #1756

Are you sure you want to change the base?

Witgen for public references #1756

Conversation

georgwiese commented Sep 5, 2024 • edited Loading

Schaeff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgwiese Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

chriseth commented Sep 26, 2024

chriseth commented Sep 26, 2024

georgwiese commented Oct 1, 2024

chriseth left a comment

Choose a reason for hiding this comment

georgwiese commented Oct 4, 2024

Schaeff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgwiese left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgwiese commented Sep 5, 2024 •

edited

Loading

georgwiese Oct 1, 2024 •

edited

Loading