Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC integration to track data artifacts & support for MLOps integration #250

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xiang-wuu
Copy link

DVC i.e Data Version Control is a data and ML experiment management tool for work flow management system that takes advantage of the existing DevOps & VCS toolset that everyone is already familiar with (Git, IDE, CI/CD, etc). There are multiple use-cases and purpose for integrating DVC specifically in ML/DL related code-bases, as managing and keeping training & validation logs for lengthy & complex training experimentation's in GitHub VCS along with trained model is not something git is made for, managing binary artifacts is still very challenging and git/GitHub doesn't encourage to support them hence DVC makes these things very hassle free.

  1. Easy versioning and tracking of all training experimentation's.
  2. Can be used to track all test/validation log's as data artifacts.
  3. All pre-trained model weights can be maintained as binary artifacts to backend storage remotes.
  4. DVC is storage agnostic as multiple backend storage's like S3-bucket,azure,SSH, google-drive,etc. can be used to store & track all binary artifacts.
  5. It is possible to sync up with git version of code-base along with the binary artifacts maintained using DVC.
  6. DVC is very easy and similar to use like git, as most of the commands are similar to git.
  7. It is possible to manage complex ML & data pipelines for all running experimentation's.

I can further update on this PR if the author is interested to integrate DVC to this code-base.
Thank You!

@xiang-wuu
Copy link
Author

@Chilicyy can you please update on this?

@zidanexu zidanexu closed this Sep 21, 2022
@zidanexu zidanexu reopened this Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants