Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sources -> canonical representation -> build #24

Open
VannTen opened this issue Oct 11, 2022 · 1 comment
Open

Sources -> canonical representation -> build #24

VannTen opened this issue Oct 11, 2022 · 1 comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. wg/cre Issues or PRs related to the Custom Runtime Environment (fka Custom Notebook Image) ODH feature.

Comments

@VannTen
Copy link
Member

VannTen commented Oct 11, 2022

Resuming the CNBi meeting today + attempt to lay down my thoughts:

Whether we use PackageList or GitRepository, (or other sources in the future),
the building part of the CNBi pipeline should be the same.

Therefore, it would be beneficial to have an intermediate representation,
sharing the image building steps.

The canonical representation would need to be defined more precisely, but for
now I propose that definition:

"A list of packages with each an exact version". That can be stored as file or as
json data.

Using a git repository for storing this has some drawbacks:

  • needs for a git server
  • more things to do for the pipeline: committing, pushing (possibly handing
    credentials)

Meanwhile, it's not super apparent what advantages it provides compared to a
"dumber" storage (could be: tekton workspaces persistent or not, K8S configmap,
CustomResource), if we only stores that kind of data.

Simplified flowchart:

flowchart LR
    subgraph Sources
        direction TB
        g[GitRepository]
        p[PackageList]
        o[Others]
    end
    g & p & o --> CR[canonical representation] 
    CR & B[BaseImage] --> F[Final notebook image]
Loading

I think that kind of design would make it easier to be more flexible regarding
the source we accept.


Notes on the longer term:

I see this canonical representation as heavily linked to micropipenv, and more
generally to the current reflection among the Python community (see this
thread

on defining a "standard lock file" or something similar). In fact, it was
studying micropipenv problem space (= precisely the point of the ongoing
upstream discussion from my POV) which led me to writing this issue.

Ideally, our canonical representation would be that "standard lock file".
However, we're probably deliver a MVP of CNBi before that standard is defined,
let alone supported, and we don't have a time machine so if we got that
direction we need to have our own format.

One part of the translation to a "canonical representation" would need to be
perform, I think, by micropipenv. I'm going to assume the role of maintainer for
it, so it could evolve in that direction to suit our needs.
(and provides a possible path for the Python standard discussion)

@codificat @goern

@goern goern added kind/documentation Categorizes issue or PR as related to documentation. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Oct 17, 2022
@VannTen
Copy link
Member Author

VannTen commented Oct 17, 2022

Some more thoughts on this:

While working on the packagelist pipeline, it occurs to me that others sources
can be transformed into it relatively easily:

GitRepository

flowchart LR
    GitRepository -->|clones| Source
    Source -->|builds| Package
    Source -->|list dependencies| dependencies
    Package & dependencies --> PackageList
Loading

(in that scenario Package is stored on some volume / or a local index)

Some caveat for this one though: getting the depencies might depend on the
python package manager used (poetry/pipenv/pdm/pip-tools).
But we need to be aware of that anyway.

From notebook:

flowchart LR
    Notebook -->|parse imports and map to packages| PackageList
Loading

Taking all that into account, we could have something like:

flowchart LR
    GitRepository --> pipenv & poetry & pdm & other --> PackageList
    Notebook --> PackageList
Loading

(this is only for sources)
More steps, but less work, ideally.

@codificat codificat added the wg/cre Issues or PRs related to the Custom Runtime Environment (fka Custom Notebook Image) ODH feature. label Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. wg/cre Issues or PRs related to the Custom Runtime Environment (fka Custom Notebook Image) ODH feature.
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants