Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for generic data directory #335

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open

Conversation

winston-h-zhang
Copy link
Member

@winston-h-zhang winston-h-zhang commented Feb 20, 2024

This PR implements a minimal generic data directory for arecibo to read/write from. It is similar the the data directory used in Lurk. Besides the immediate need to have some way of managing witness/cross-term vectors for our MSM data analysis, there has been a need for this type of infrastructure for a while.

See, for example, microsoft/Nova#270, which can build upon this PR. There is also argumentcomputer/lurk-beta#1086, which may need to move some of Lurk's caching logic into arecibo.

This PR doesn't intend to address all the issues of these PRs, but it would be nice to have the first increment of work streamlined into the codebase, so that we can slowly build up to a robust solution.

Copy link
Member

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to solve an issue that does not exist.

So this is solving an issue that isn't specified. Could we write it down and open that issue, or more clearly explain what problem this tries to solve in the PR description?

Moreover, on the shape of the addition: this is writing the witness and cross-term by inserting itself in the RecursiveSNARK. This is obviously invasive not only from a protocol standpoint (which might be justified by unspecified goals), but also from a code organization standpoint: we would now need to maintain writing configuration logic in the SuperNova RecursiveSNARK, at least, and possibly in the CompressedSNARKs (x 4).

src/lib.rs Outdated
@@ -489,6 +502,13 @@ where
T: r1cs::default_T::<Dual<E1>>(r1cs_secondary.num_cons),
};

let config = RecursiveSNARKConfig {
#[cfg(any(target_arch = "x86_64", target_arch = "aarch64"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the configuration override? does my Raspberry pi (arm7) not have a hard drive?

Copy link
Member

@wwared wwared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not commenting on whether this PR solves the issues around loading/writing data in the way we want to or not, but wanted to point out a few things regarding this code regardless of that:

  • Since we call set or get_or_init from within the prove_step function (by calling write_data()), isn't it possible that ARECIBO_CONFIG could be initialized from different threads? (even possibly from different threads proving different, unrelated things) It might be better to use OnceLock instead if that's the case (OnceCell is not thread-safe) (it should not be a big deal to change because it may only block during initialization)
  • The cfg overrides feel weird to me, shouldn't it be cfg(target_arch = "wasm32") instead of cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))?
  • I feel like it'd be simpler if the data mod was present on every arch, but for the arches we disable it (wasm) just replace with dummy functions instead. This prevents the #[cfg] directives from polluting the callers of the data module. One way this could be implemented is moving the real implementation to a sub-module, and re-exporting inside data with #[cfg] checks
  • Usability-wise, it would be good to support overriding the ~/.arecibo_data path with an env var like ARECIBO_DATA_DIR or similar
  • I know it's meant to be extended in the future, but it's weird how write_data is duplicated on the global, per-process DataConfig stored in ARECIBO_CONFIG as well as in the RecursiveSNARKConfig present in every RecursiveSNARK. I think for now it would be better to not have the RecursiveSNARKConfig at all and keep configuration (enabling/disabling the write data) on the data module, possibly controlled by other env vars

@winston-h-zhang
Copy link
Member Author

Thank you for all the comments! I've written a starting issue here: #371. It's more of a set of questions that I have for how to proceed, rather than what I think needs to be done, so I would be happy to discuss and create a plan for what's next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants