-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where to Store Data Artifacts for Pythia? #20
Comments
That's a great question. We have the pythia-datasets repo/package that we've used to house example data for Foundations tutorials. That's certainly a possibility. The datasets themselves are stored within the repo (which works ok for small things) and they can be accessed from notebooks with a lightweight API based on pooch, see this Foundations example We're exploring the idea of requesting Pythia storage buckets via Open Storage Network to host larger ARCO datasets that we can build significant Cookbook content on, but that's still just an idea at this point. @ktyle this would be great to put on the agenda for the next IWG meeting. |
Yes, will put on the agenda. @norlandrhagen this could also be a good candidate for a Pangeo Forge recipe. |
Thanks for the reply @brian-rose and @ktyle. Would love to hear what comes out of next weeks meeting. @ktyle. Definitely a good call! I'm not super savvy on the details, but I think the current storage for Pangeo-Forge is tied into the currently NSF grant, which ends in roughly a year. Also, the |
@norlandrhagen I wondered how long the Pangeo Forge storage will remain available. Good to know. Our next infrastructure working group meeting is Mon 3/6 (next week is our Education working group; these two working groups meet on alternate Mondays). 😄 |
We are going to push for some better guidance on this prior to our summer 2023 hackathon. |
Thanks for letting me hijack this issue for a more general answer. At today's IWG meeting we agreed on our recommendations for cookbook data artifacts, in loose order of preference:
We'll leave this issue open for discussion and until we document these suggestions with more instruction across the project, eg in the cookbook contributor's guide. |
More general discussion of the data storage issue: ProjectPythia/cookbook-gallery#155 |
@norlandrhagen are you still interested in hosting some data artifacts from the kerchunk cookbook? If so, please let me know the approximate size in total. I can then advise on how best you can transfer your data to Project Pythia's object store hosted on Jetstream2. |
Hey @ktyle! I think the only real artifact we have stored is a ~272 Mb Kerchunk parquet reference. |
@norlandrhagen is it |
Hi there,
I have a general Pythia infrastructure question. In some sections of the
Kerchunk-Cookbook
, we would like to demonstrate how to open up a pre-generatedKerchunk
reference file for a large virtual dataset. We can host this reference.json
or.parquet
on aCarbonPlan
cloud account for now, but we were wondering if there is a preferred location to host these artifacts. Ideally, it seems that these could live in some Pythia bucket, so they are connected to the project. Not sure if there are any resources to host these, but just wondering what everyone's thoughts were. The file size should be quite small, probably one or two files in the 10's to 100's MB range.Thanks!
cc @maxrjones @brian-rose
The text was updated successfully, but these errors were encountered: