Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model training versioning support in Jaseci/JSORC #1092

Open
ypkang opened this issue Apr 26, 2023 · 0 comments
Open

Model training versioning support in Jaseci/JSORC #1092

ypkang opened this issue Apr 26, 2023 · 0 comments
Labels
ai_kit Related to jaseci ai kit consumer depending A consumer of Jaseci is depending on this issue. core Related to jaseci core and serv P1 Issue require timely attention, or feature that is high importance.

Comments

@ypkang
Copy link
Contributor

ypkang commented Apr 26, 2023

Objective:

  • As developer trains and fine-tune models via jac, the jaseci/jsorc infrastructure should automatically manage model versioning and maintain consistent behavior through any pod/server restart.

Solution/Design:

  • TLDR -- we will implement a very light-weight MLFlow-esque system in jaseci ai kit and jsorc.
  • On the jaseci ai kit module side
    • automatically track version history of trained models (with a default retention range, e.g. last 10 training iterations). With every train action, user will receive an UUID pointing to that version of the model.
    • user can also explicitly save trained version. this will be retaiend forever.
    • All trained models versions are saved in the PV (.jaseci/ directory).
    • And we save a table of all the UUIDs of the versions, pointing to the model path. We will start with this, and expand it to include other meta information such as accuracies, training loss, etc.. So we will need to design a format that is extensible.
    • And we also save the UUID of the latest version, or the last active version.
    • In the SETUP action, we will try to load the latest UUID version first, if it doesn't exist, load the default.
  • On the jaseci, jsorc side.
    • For now, we will need to make sure that the auto-reload logic of jaseci actions invoke setup() so in the case of jaseci pod restarts, any previously loaded and trained local module will also load up the latest version pointed to by the latest UUID.
    • In the future, JSORC will handle this.
@ypkang ypkang added core Related to jaseci core and serv P1 Issue require timely attention, or feature that is high importance. ai_kit Related to jaseci ai kit consumer depending A consumer of Jaseci is depending on this issue. labels Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai_kit Related to jaseci ai kit consumer depending A consumer of Jaseci is depending on this issue. core Related to jaseci core and serv P1 Issue require timely attention, or feature that is high importance.
Projects
None yet
Development

No branches or pull requests

1 participant