Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Witgen for public references #1756

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Witgen for public references #1756

wants to merge 20 commits into from

Conversation

georgwiese
Copy link
Collaborator

@georgwiese georgwiese commented Sep 5, 2024

Step towards #1633

This PR adds witness generation for any public that is referenced from an identity.

Note that publics and public references are now existing independently:

  • A public is still defined as a pointer to a cell in the trace. The prover extracts the values from the trace and returns them to the verifier; witgen has nothing to do with them (except providing the values in the trace).
  • A public reference (i.e., a public that is referenced by a constraint) was previously unimplemented. Now, witgen would solve for this value. This value might not be the same as the value of the public being referenced! We don't check for consistency.

After #1633 is completed, publics will no longer be defined in terms of trace cells, so the values returned by witgen will be the ones that are returned to the verifier.

For now, the values are not returned yet (and different machines might find conflicting values for the same public). But the solving works, and I added a log message, e.g.:

$ cargo run pil test_data/pil/fibonacci_with_public.pil -o output -f
Writing output/fibonacci_with_public_analyzed.pil.
done.
Optimizing pil...
Removed 0 witness and 0 fixed columns. Total count now: 2 witness and 1 fixed columns.
Writing output/fibonacci_with_public_opt.pil.
Evaluating fixed columns...
Fixed column generation took 0.001645084s
Writing output/constants.bin.
Deducing witness columns...
Running main machine for 4 rows
[00:00:00 (ETA: 00:00:00)] ░░░░░░░░░░░░░░░░░░░░ 0% - Starting...                                                   
        => out (public) = 5
[00:00:00 (ETA: 00:00:00)] ████████████████████ 100% - Starting...                                                   
Witness generation took 0.00259025s
Writing output/commits.bin.

@georgwiese georgwiese changed the base branch from main to introduce-algebraic-variable September 5, 2024 12:58
github-merge-queue bot pushed a commit that referenced this pull request Sep 23, 2024
(First 4 commits of #1650)

This PR prepares witness generation for scalar publics (#1756). Scalar
publics are similar to cells in the trace, but are global (i.e.,
independent on the row number).

With this PR, affine expressions use a new `AlgebraicVariable` enum,
that can be either a column reference (`&'a AlgebraicReference`, which
was used previously), or a reference to a public.
Base automatically changed from introduce-algebraic-variable to main September 23, 2024 15:07
@georgwiese georgwiese changed the title [WIP] Witgen for public references Witgen for public references Sep 26, 2024
@georgwiese georgwiese marked this pull request as ready for review September 26, 2024 11:35
Copy link
Collaborator

@Schaeff Schaeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but I need to have a closer look


let eval_value = if eval_value.is_complete() {
log::trace!("End processing VM '{}' (successfully)", self.name());
// Remove the last row of the previous block, as it is the first row of the current
// block.
self.data.pop();
self.data.extend(block);
self.publics.extend(publics);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a public value be silently replaced here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because setting a value for a public that is already in the map is a hard failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as long as the set of publics is all publics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

@@ -181,7 +195,9 @@ impl<'a, T: FieldElement> Generator<'a, T> {
);
processor.solve(&mut sequence_iterator).unwrap();

processor.finish().remove(1)
// Ignore any updates to the publics at this point, as we'll re-visit the last row again.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume that publics are only present in the last row?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This piece of code is just to initialize the first row from identities like pc' = (1 - first_step') * <...>.

Since this is the very first thing that happens, we won't have any publics at that point. We could in theory derive a public value here (e.g. if there was a constraint like first_step' * (foo - :final_foo) = 0, but we'll process the last row again anyway, so it should be OK to discard here.

executor/src/witgen/machines/block_machine.rs Outdated Show resolved Hide resolved
executor/src/witgen/vm_processor.rs Outdated Show resolved Hide resolved
@@ -28,12 +30,21 @@ impl<'a, 'b, 'c, T: FieldElement, Q: QueryCallback<T>> BlockProcessor<'a, 'b, 'c
pub fn new(
row_offset: RowIndex,
data: FinalizableData<T>,
publics: BTreeMap<&'a str, T>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might they not be part of the machine parts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it: No, because these are the already known values for the publics, right?

Maybe this can be combined with the values for challenges in the future...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I grouped the trace block and publics, using the same type returned by finish().

@@ -393,6 +401,7 @@ Known values in current row (local: {row_index}, global {global_row_index}):
&self.data[row_index],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract the creation of a row pair into a helper function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried that a long time ago and failed because of lifetime issues. Will try again, but in a separate PR.

Copy link
Collaborator Author

@georgwiese georgwiese Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem is that the RowPair contains references to fields of Processor, so as long as the row pair exists, Processor cannot be borrowed mutably. If inlined, the compiler understands that only some fields are borrowed, not all of &self, but once this is in a helper function, there is no way to communicate that to the compiler.

@chriseth
Copy link
Member

I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?

@chriseth
Copy link
Member

If it's not machine-specific, we could have a situation where two conflicting values for a public lead to an assertion failure (in the 'apply update' function) instead of a proper error that identities are conflicting.

@georgwiese
Copy link
Collaborator Author

I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?

In this current code, each machine solves for any publics that appears in its identities. They are never returned, and any conflicts (two machines determining a different value for the same public) are not detected. Also, we don't fail if a public is not assigned a value by any machine.

This PR is more an intermediate step, implementing the solving within a machine. To implement the full thing, we have to answer whether a public belongs to only one machine. All practical examples I know of would be like that, but in theory you could access publics in different machines, which would put a constraint between the two machines and make the solving pretty complicated.

  • If yes, how do we detect it? AFAIK, publics currently live outside any namespace. Perhaps the easiest would be to check each machine's identities and fail if the assignment is not unique.
  • If no, I think the easiest would be to assert that all publics have a value and fail if two machines assign a different value.

Copy link
Member

@chriseth chriseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can you add a test for two machines accessing the same public but assigning two different values?

@georgwiese
Copy link
Collaborator Author

OK! I started a executor/tests/witgen.rs to only test witgen, no proof generation.

Copy link
Collaborator

@Schaeff Schaeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was waiting for a change to the machine extractor. Where in the code is the set of publics actually determined? Or is it not machine-specific?

In this current code, each machine solves for any publics that appears in its identities. They are never returned, and any conflicts (two machines determining a different value for the same public) are not detected. Also, we don't fail if a public is not assigned a value by any machine.

This PR is more an intermediate step, implementing the solving within a machine. To implement the full thing, we have to answer whether a public belongs to only one machine. All practical examples I know of would be like that, but in theory you could access publics in different machines, which would put a constraint between the two machines and make the solving pretty complicated.

  • If yes, how do we detect it? AFAIK, publics currently live outside any namespace. Perhaps the easiest would be to check each machine's identities and fail if the assignment is not unique.
  • If no, I think the easiest would be to assert that all publics have a value and fail if two machines assign a different value.

I think public inputs should be inside namespaces, in the future proof objects. Until then, I agree with the first option.

@@ -25,6 +25,7 @@ indicatif = "0.17.7"
serde = { version = "1.0", default-features = false, features = ["alloc", "derive", "rc"] }

[dev-dependencies]
powdr-pipeline.workspace = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this avoidable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I could manually do all steps that need to happen before witgen. I guess the advantage would be that we depend on fewer crates to build the tests (= not the backend crate), but it would a bit cumbersome to write and maintain.

Do you think it's worth it? If we run all tests anyway, the compilation time would be the same, no? In what scenario would it be faster?


#[test]
#[should_panic = "Publics are referenced by more than one machine: {\"public\"}"]
fn two_machines_conflicting_public() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this test is supposed to test witness generation, but shouldn't we just put it into the pipeline crate if it's not a unit test? That way we avoid the dependency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate which workflow would be faster then? I think it does fit will in the executor crate...


let eval_value = if eval_value.is_complete() {
log::trace!("End processing VM '{}' (successfully)", self.name());
// Remove the last row of the previous block, as it is the first row of the current
// block.
self.data.pop();
self.data.extend(block);
self.data.extend(updated_data.block);
self.publics.extend(updated_data.publics);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so horrible for performance...
At some point we need to make a big refactor and allow direct access to all those variables from within the processors (unless they are really isolated).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, this is only run at the end of a call to a secondary VM, right? Given that the list of publics will be small (assuming a succinct verifier), I'd expect this to be negligible...

@@ -32,6 +30,36 @@ pub struct ExtractionOutput<'a, T: FieldElement> {
pub base_parts: MachineParts<'a, T>,
}

#[derive(Default)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this (non-pub) below (pub) split_out_machine (which is the main entry point into this file)?


impl<'a> PublicsTracker<'a> {
/// Given a machine's identities, add all publics that are referenced by them.
/// Panics if a public is referenced by more than one machine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it return an error instead?

.intersection(&referenced_publics)
.collect::<BTreeSet<_>>();
if !intersection.is_empty() {
panic!("Publics are referenced by more than one machine: {intersection:?}",);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print their names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, no? That's what's in intersection? (We still use names as IDs for public references...)

&mut self,
identities: &[&'a powdr_ast::analyzed::Identity<SelectedExpressions<Expression<T>>>],
) {
let referenced_publics = identities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a proper use-case, but actually we should only consider LHSs of lookups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but all the identities here should be non-lookups, ok.

@@ -24,6 +24,27 @@ use super::{

type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>;

/// The data mutated by the processor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this distinctive enough from the MutableState?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to SolverState, what do you think?

Copy link
Collaborator Author

@georgwiese georgwiese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth Thanks for the review! Think I addressed your comments, except for the new test dependency. It's not clear to me what the best solution is yet. Would it be fixed if it only depended on the steps before witness generation? Then I would prefer to do that vs moving the test to the pipeline crate.


let eval_value = if eval_value.is_complete() {
log::trace!("End processing VM '{}' (successfully)", self.name());
// Remove the last row of the previous block, as it is the first row of the current
// block.
self.data.pop();
self.data.extend(block);
self.data.extend(updated_data.block);
self.publics.extend(updated_data.publics);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, this is only run at the end of a call to a secondary VM, right? Given that the list of publics will be small (assuming a succinct verifier), I'd expect this to be negligible...

.intersection(&referenced_publics)
.collect::<BTreeSet<_>>();
if !intersection.is_empty() {
panic!("Publics are referenced by more than one machine: {intersection:?}",);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, no? That's what's in intersection? (We still use names as IDs for public references...)

@@ -24,6 +24,27 @@ use super::{

type Left<'a, T> = Vec<AffineExpression<AlgebraicVariable<'a>, T>>;

/// The data mutated by the processor
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to SolverState, what do you think?


#[test]
#[should_panic = "Publics are referenced by more than one machine: {\"public\"}"]
fn two_machines_conflicting_public() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate which workflow would be faster then? I think it does fit will in the executor crate...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants