-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Witgen for public references #1756
base: main
Are you sure you want to change the base?
Conversation
b411b48
to
4ce4dc1
Compare
4ce4dc1
to
7c4c4cf
Compare
(First 4 commits of #1650) This PR prepares witness generation for scalar publics (#1756). Scalar publics are similar to cells in the trace, but are global (i.e., independent on the row number). With this PR, affine expressions use a new `AlgebraicVariable` enum, that can be either a column reference (`&'a AlgebraicReference`, which was used previously), or a reference to a public.
3cf5f35
to
a19e3a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but I need to have a closer look
executor/src/witgen/generator.rs
Outdated
|
||
let eval_value = if eval_value.is_complete() { | ||
log::trace!("End processing VM '{}' (successfully)", self.name()); | ||
// Remove the last row of the previous block, as it is the first row of the current | ||
// block. | ||
self.data.pop(); | ||
self.data.extend(block); | ||
self.publics.extend(publics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a public value be silently replaced here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because setting a value for a public that is already in the map is a hard failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as long as the set of publics is all publics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
@@ -181,7 +195,9 @@ impl<'a, T: FieldElement> Generator<'a, T> { | |||
); | |||
processor.solve(&mut sequence_iterator).unwrap(); | |||
|
|||
processor.finish().remove(1) | |||
// Ignore any updates to the publics at this point, as we'll re-visit the last row again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume that publics are only present in the last row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This piece of code is just to initialize the first row from identities like pc' = (1 - first_step') * <...>
.
Since this is the very first thing that happens, we won't have any publics at that point. We could in theory derive a public value here (e.g. if there was a constraint like first_step' * (foo - :final_foo) = 0
, but we'll process the last row again anyway, so it should be OK to discard here.
@@ -28,12 +30,21 @@ impl<'a, 'b, 'c, T: FieldElement, Q: QueryCallback<T>> BlockProcessor<'a, 'b, 'c | |||
pub fn new( | |||
row_offset: RowIndex, | |||
data: FinalizableData<T>, | |||
publics: BTreeMap<&'a str, T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might they not be part of the machine parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it: No, because these are the already known values for the publics, right?
Maybe this can be combined with the values for challenges in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grouped the trace block and publics, using the same type returned by finish()
.
@@ -393,6 +401,7 @@ Known values in current row (local: {row_index}, global {global_row_index}): | |||
&self.data[row_index], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract the creation of a row pair into a helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tried that a long time ago and failed because of lifetime issues. Will try again, but in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the problem is that the RowPair
contains references to fields of Processor
, so as long as the row pair exists, Processor
cannot be borrowed mutably. If inlined, the compiler understands that only some fields are borrowed, not all of &self
, but once this is in a helper function, there is no way to communicate that to the compiler.
I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific? |
If it's not machine-specific, we could have a situation where two conflicting values for a public lead to an assertion failure (in the 'apply update' function) instead of a proper error that identities are conflicting. |
In this current code, each machine solves for any publics that appears in its identities. They are never returned, and any conflicts (two machines determining a different value for the same public) are not detected. Also, we don't fail if a public is not assigned a value by any machine. This PR is more an intermediate step, implementing the solving within a machine. To implement the full thing, we have to answer whether a public belongs to only one machine. All practical examples I know of would be like that, but in theory you could access publics in different machines, which would put a constraint between the two machines and make the solving pretty complicated.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you add a test for two machines accessing the same public but assigning two different values?
OK! I started a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?
In this current code, each machine solves for any publics that appears in its identities. They are never returned, and any conflicts (two machines determining a different value for the same public) are not detected. Also, we don't fail if a public is not assigned a value by any machine.
This PR is more an intermediate step, implementing the solving within a machine. To implement the full thing, we have to answer whether a public belongs to only one machine. All practical examples I know of would be like that, but in theory you could access publics in different machines, which would put a constraint between the two machines and make the solving pretty complicated.
- If yes, how do we detect it? AFAIK, publics currently live outside any namespace. Perhaps the easiest would be to check each machine's identities and fail if the assignment is not unique.
- If no, I think the easiest would be to assert that all publics have a value and fail if two machines assign a different value.
I think public inputs should be inside namespaces, in the future proof objects. Until then, I agree with the first option.
@@ -25,6 +25,7 @@ indicatif = "0.17.7" | |||
serde = { version = "1.0", default-features = false, features = ["alloc", "derive", "rc"] } | |||
|
|||
[dev-dependencies] | |||
powdr-pipeline.workspace = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this avoidable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I could manually do all steps that need to happen before witgen. I guess the advantage would be that we depend on fewer crates to build the tests (= not the backend crate), but it would a bit cumbersome to write and maintain.
Do you think it's worth it? If we run all tests anyway, the compilation time would be the same, no? In what scenario would it be faster?
|
||
#[test] | ||
#[should_panic = "Publics are referenced by more than one machine: {\"public\"}"] | ||
fn two_machines_conflicting_public() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this test is supposed to test witness generation, but shouldn't we just put it into the pipeline crate if it's not a unit test? That way we avoid the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate which workflow would be faster then? I think it does fit will in the executor
crate...
|
||
let eval_value = if eval_value.is_complete() { | ||
log::trace!("End processing VM '{}' (successfully)", self.name()); | ||
// Remove the last row of the previous block, as it is the first row of the current | ||
// block. | ||
self.data.pop(); | ||
self.data.extend(block); | ||
self.data.extend(updated_data.block); | ||
self.publics.extend(updated_data.publics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so horrible for performance...
At some point we need to make a big refactor and allow direct access to all those variables from within the processors (unless they are really isolated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, this is only run at the end of a call to a secondary VM, right? Given that the list of publics will be small (assuming a succinct verifier), I'd expect this to be negligible...
@@ -32,6 +30,36 @@ pub struct ExtractionOutput<'a, T: FieldElement> { | |||
pub base_parts: MachineParts<'a, T>, | |||
} | |||
|
|||
#[derive(Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this (non-pub) below (pub) split_out_machine
(which is the main entry point into this file)?
|
||
impl<'a> PublicsTracker<'a> { | ||
/// Given a machine's identities, add all publics that are referenced by them. | ||
/// Panics if a public is referenced by more than one machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it return an error instead?
.intersection(&referenced_publics) | ||
.collect::<BTreeSet<_>>(); | ||
if !intersection.is_empty() { | ||
panic!("Publics are referenced by more than one machine: {intersection:?}",); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you print their names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are, no? That's what's in intersection
? (We still use names as IDs for public references...)
&mut self, | ||
identities: &[&'a powdr_ast::analyzed::Identity<SelectedExpressions<Expression<T>>>], | ||
) { | ||
let referenced_publics = identities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a proper use-case, but actually we should only consider LHSs of lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but all the identities here should be non-lookups, ok.
@@ -24,6 +24,27 @@ use super::{ | |||
|
|||
type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>; | |||
|
|||
/// The data mutated by the processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this distinctive enough from the MutableState?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to SolverState
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chriseth Thanks for the review! Think I addressed your comments, except for the new test dependency. It's not clear to me what the best solution is yet. Would it be fixed if it only depended on the steps before witness generation? Then I would prefer to do that vs moving the test to the pipeline crate.
|
||
let eval_value = if eval_value.is_complete() { | ||
log::trace!("End processing VM '{}' (successfully)", self.name()); | ||
// Remove the last row of the previous block, as it is the first row of the current | ||
// block. | ||
self.data.pop(); | ||
self.data.extend(block); | ||
self.data.extend(updated_data.block); | ||
self.publics.extend(updated_data.publics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, this is only run at the end of a call to a secondary VM, right? Given that the list of publics will be small (assuming a succinct verifier), I'd expect this to be negligible...
.intersection(&referenced_publics) | ||
.collect::<BTreeSet<_>>(); | ||
if !intersection.is_empty() { | ||
panic!("Publics are referenced by more than one machine: {intersection:?}",); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are, no? That's what's in intersection
? (We still use names as IDs for public references...)
@@ -24,6 +24,27 @@ use super::{ | |||
|
|||
type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>; | |||
|
|||
/// The data mutated by the processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to SolverState
, what do you think?
|
||
#[test] | ||
#[should_panic = "Publics are referenced by more than one machine: {\"public\"}"] | ||
fn two_machines_conflicting_public() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate which workflow would be faster then? I think it does fit will in the executor
crate...
Step towards #1633
This PR adds witness generation for any public that is referenced from an identity.
Note that publics and public references are now existing independently:
After #1633 is completed, publics will no longer be defined in terms of trace cells, so the values returned by witgen will be the ones that are returned to the verifier.
For now, the values are not returned yet (and different machines might find conflicting values for the same public). But the solving works, and I added a log message, e.g.: