Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] add scientific filesystem (scif) #58

Closed

Conversation

vsoch
Copy link
Contributor

@vsoch vsoch commented Apr 3, 2019

This is a proposal to make the Scientific Filesystem specification a part of opencontainers to encourage consistency, transparency, programmatic accessibility, and modularity of applications installed in containers. Please see the full proposal for background and external links, and please post any questions here. Thanks!

Signed-off-by: Vanessa Sochat vsochat@stanford.edu

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>
@caniszczyk
Copy link
Contributor

RFC @opencontainers/tob

@vsoch
Copy link
Contributor Author

vsoch commented Apr 18, 2019

I can add more comment / details here if needed - I was planning on pinging the list (via email) in a week or two.

@vsoch
Copy link
Contributor Author

vsoch commented Apr 18, 2019

Here are some more notes! I wrote them up after the meeting today - most are redundant with the proposal doc here, but hey, redundancy is a good thing! Note that there is a specification, a published paper for it, a Python client, containers that come with it ready to use, and (under development) a library implemented in GoLang for all these golang based container technologies.

What is the problem that SCIF solves?

There is a clear issue with discoverability of containers. Namely, aside from running a container or finding an external documentation base, we cannot easily answer:

  • what entrypoints does my container have that the creator intended me to use?
  • what are environments and metadata specific to those entrypoints?

And further, this information is not programatically accessible. If a registry is to digest a container, it's a black box.

What is the Scientific Filesystem?

The scientific filesystem (scif) addresses these issues. It's an organizational and interaction specification for (internal) modules inside containers. It makes it possible for a single container to have more than one module (called an "app") with an entrypoint, environment, metadata, runscript/help, install commands, and files.

How is it created?

Having an organizational ruling is useless unless it makes lives easier for the user. This is why (the implementations) provide the user with what's called a scif recipe (e.g., recipe.scif) that allows the user to write chunks of text that the client then generates the organization and subsequent interaction from. The file is then installed inside of a container using the client, and the client "scif" becomes the entrypoint to control further interaction. This would be the commands to install in a Docker container:

RUN scif install recipe.scif
ENTRYPOINT ["scif"]

Why does this belong with OCI?

We need to spend more time on discoverability of containers. A registry needs to be able to interact with a container, and have the container spit out it's guts. The rationale for SCIF for scientific applications (needing more than one exposed / discoverable application in a single container, or running a container with a different environment context) is obvious. However a larger vision is providing a means for a "semantic API for web units" (@vbatts).

If you don't feel like reading, just try it out! Examples for Docker and Singularity

https://sci-f.github.io/tutorial-really-quick-start

And a more complex genomic container with snakemake:

https://github.com/sci-f/snakemake.scif

@estesp
Copy link
Contributor

estesp commented Apr 24, 2019

@vsoch thanks for putting together this well-thought out proposal. And sorry for the lack of time spent reading and providing some comments so far. I finally had time to digest the proposal given we didn't have an OCI call today :)

So far my biggest unanswered question is how this makes sense in the OCI given the current scope and charter of the OCI. This statement has no bearing on whether this is something valuable, and as stated in the proposal's answer, something that the container ecosystem should or could be "spending more time on." It is simply a reasonable question to ask given the very limited scope in which the OCI was founded and created. My personal opinion is that this, and other worthy efforts, that sit above the standard container runtime and image spec, would first need to propose changing the scope of the OCI and get ratification of that scope change before we decide whether a particular project makes sense to add here.

Another way to potentially answer this is, without SCIF in the OCI, what "inoperability" would happen across the container ecosystem because that piece is not there. With image, distribution, and runtime spec today, those answers are fairly clear to me. Everything else seems to sit above that until we find a particular layer being re-implemented in non-portable ways across the ecosystem that should be standardized to keep an "open container" ecosystem, which is the charter and purpose initially set for the OCI. This is only my opinion and I would love to hear other views from OCI TOB members.

@vsoch
Copy link
Contributor Author

vsoch commented Apr 25, 2019

Thanks @estesp. it never occurred to me that an internal organization and specification for container interaction (during runtime) would be out of scope. To quickly share the prominent statement about OCI:

The Open Container Initiative (OCI) is a lightweight, open governance structure (project), formed under the auspices of the Linux Foundation, for the express purpose of creating open industry standards around container formats and runtime.

I would argue that SCIF is closer to this spirit than the distribution specification, which outlines a registry interaction. SCIF is on a similar foot, but instead of serving content from an external resource, it specifies how to serve (and interact with) content internal to the container at runtime. I don't see that OCI needs a change of scope, but would be interested to hear others' feedback.

Without SCIF in the OCI, containers will remain largely as black boxes that a user has no way to expose the guts for, aside from really well done documentation and entry point help. There would be no programmatic way to extract this information, or to reliably run tests across the installed software in a container (without knowing what is there first).

@vsoch
Copy link
Contributor Author

vsoch commented Apr 25, 2019

What is important is that this issue is addressed, and I think that OCI is the right place to do it.

@crosbymichael
Copy link
Member

-1 from me.

I don't see how this fits into the OCI. I looked at the github repos and docs and this is no where near mature enough to consider or have traction with users.

I believe this is too immature at this point in the SCIF project for standardization talks.

@vsoch
Copy link
Contributor Author

vsoch commented Apr 25, 2019

What are these criteria for being "mature?" A lot of the specs here came directly from company decisions, or "let's just move the repo from there to here" sorts of deals. If the criteria is to go out, start a company, and establish a user base of many thousands that use SCIF, that is a severely limiting factor for someone that wants to contribute that doesn't work in industry. What am I empowered to do? I'll mention again that the spec itself was developed collaboratively with scientific container users. It went through peer review and is published. There are now Python and GoLang clients, along with native integration in Singularity. Scientific users build it into containers, and some have included in a publication. Yes, it's likely not production level to the point of having been maintained by 10s to 100s of people at a company, because I can't do that. But it's a solid idea and specification and it solves a problem that is essential for scientific reproducibility. I've done everything that I'm empowered to do as a single person and I'm bringing it here to take it to the next level, and give attention to the issue and encourage others to use it. There is no other project that is attempting to solve this problem, and so it seems irrational to turn it away. However, if you tell me that the criteria for maturity requires some level of enterprise use, then I cannot comply, and I probably don't belong here trying to participate in this community at all.

@vsoch
Copy link
Contributor Author

vsoch commented Apr 25, 2019

And here is a quick glance of the SCIF apps that have been discovered in Singularity containers on Singularity Hub:

https://singularity-hub.org/apps

It hasn't been promoted / advertised, it's just a part of the Singularity docs that offers a way to create multiple entrypoints, environments, etc., and from what I can tell, people that find those docs seem to like it.

@dmcgowan
Copy link
Member

What are these criteria for being "mature?"

It is related to adoption not the number of maintainers or where it came from. OCI helps the ecosystem agree on something that has multiple implementations and is in the best interest of the community to interoperate. The need to be in OCI can only be clear after a specification is widely used in the ecosystem and multiple implementations/use cases been worked on. What this level of maturity is could certainly be up for debate and hasn't been quantified into criteria.

so it seems irrational to turn it away

Not seeing it as ready is not the same as being turned away. OCI doesn't have a history of incubating newer specifications that the industry could use, but rather standardizing what the industry is already using. We have seen this in the distribution specification as well, where there is a desire to have new functionality, but OCI is not the best place to define that functionality, only to standardize what has been proven to work and be useful.

@vsoch
Copy link
Contributor Author

vsoch commented Apr 25, 2019

To prompt some discussion - @dmcgowan and I had a short correspondence on slack, and I have some questions for the community here.

OCI is definitely industry oriented - it's clear that the specifications included are mature, where mature is defined as adoption in the community, and while I'd like to think that this is easy to do in academia, it's a lot harder than if you are in industry and it gets delivered with a product. So let's step back.

The first suggestion by @dmcgowan was for clarification, and I agree. There isn't any clear criteria about what constitutes mature. For example, how were these proposals mature, to create general repos for testing? As an academic / non industry person, my understanding of the community was that the goals were to develop specifications to guide good practice for container running, storage, interaction, etc. Most of the discussion I've seen around artifacts, registries, and other specifications comes down to points in a spec.md, and they are just ideas. For example, we understand that content types are useful to know, but there isn't a mature developed thing existing out there that would make that idea mature. But it's a great idea. It deserves support. I had thought that contributing was not about presenting a cleanly finished platter, but a solid idea and (to go further) implementations to say "Hey, this works, and it's useful, now let's shine it up." OCI is a group of projects. My understanding was that the projects are living, and that this community would be the place to bring new ideas that can improve working in this container space. If my understanding is wrong, this needs to be made explicit.

I've been trying to contribute having that understanding, and if OCI is not a place for ideas, or innovation, or including non-corporate developers, then it feels like a community of privilege that I'm not intended to be a part of. That's okay, but this again needs to be made explicit.

The next point of discussion is about where. If a project like SCIF doesn't belong with OCI, then where does it belong? It seems like industry steers the boat and everyone else follows, but everyone else doesn't have much power to draw the map or turn the wheel.

Thanks for everyone's thoughtful responses. These points are important.

@dmcgowan
Copy link
Member

The next point of discussion is about where. If a project like SCIF doesn't belong with OCI, then where does it belong?

That is a good question and I think a fair point. OCI is mostly specification, then a few implementations. CNCF is mostly projects with 1 specification? I think CNCF is better equipped for taking good project ideas and giving them a platform with CNCF sandboxes than OCI is. Even if this project were added to OCI, OCI is not setup to help projects mature. Look at the projects you linked to as an example, I am not sure that being OCI repositories made them ubiquitous. All the other specifications were already heavily used so OCI was more a stamp of maturity and consensus amongst the industry.

@vsoch
Copy link
Contributor Author

vsoch commented Apr 26, 2019

CNCF is also industry oriented, and assuming that I'm a company, and asking for "trademark transfer."

Require adherence to CNCF IP Policy (including trademark transferred)

see here.

There is no trademark, only zuul. :)

In all seriousness, both of these communities seem to be about politics and reducing competition between companies. There is no (traditionally) open source community or platform to support projects like this.

@estesp
Copy link
Contributor

estesp commented Apr 26, 2019

Just some added thoughts from me, specific to this comment from @vsoch :

I've been trying to contribute having that understanding, and if OCI is not a place for ideas, or innovation, or including non-corporate developers, then it feels like a community of privilege that I'm not intended to be a part of. That's okay, but this again needs to be made explicit.

Possibly we have done a poor job of reiterating why the OCI exists. If in any way is seems like a group that stifles innovation, independent developers, or is closed to new ideas, it could feel that way because it was created for a very narrow purpose at a time when there were serious doubts that we could get all the industry players in the container space to cooperate and agree on a common idea of "what is a container?"

Everything else--including many innovative and important projects--is outside this narrow focus. Stating that says nothing about any exclusivity other than by design of the OCI purpose. Who gets to be in OCI is not a discussion of worthiness or even importance to the ecosystem as so many very important pieces of the container ecosystem are already outside the OCI and have no plans of trying to join the OCI. For instance, if containers didn't have networking functionality, I assume 99% of container-native software projects would go find another software isolation technology. Therefore we can state that networking is critically important to the container ecosystem, but yet is fully outside the scope of the OCI and not even a discussion point about whether we should attack networking as a project in the OCI.

Specific to SCIF, you have mentioned this solves an important problem in the ecosystem. If that is so, then your project should and can stand on its own. Jess Frazelle created "genuinetools" for example, and has no intention of bringing them to the OCI or CNCF--their importance and usefulness stand on their own, and many of those projects have more "stars" than any OCI project repository.

Given this, I'm still trying to understand how being in the OCI would help SCIF fit the purpose you and others believe it was created for. If it were standardized would more people use it? They wouldn't be forced to use it as OCI is a set of optional specifications. I think SCIF can and should seek to find its purpose aside from the OCI for now. That's a personal opinion, and happy to hear other views from other TOB members.

@dmcgowan
Copy link
Member

-1

I think discussion for this has already moved onto to other forums but we shouldn't just leave open here.

@vsoch
Copy link
Contributor Author

vsoch commented Jul 25, 2019

Yeah ok. I better understand the purpose of OCI and SCIF doesn’t fit.

@vsoch vsoch closed this Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants