Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: minio support in fiab #163

Merged
merged 1 commit into from
Jul 1, 2022
Merged

Conversation

myungjin
Copy link
Contributor

In fiab, mlflow is configured without an object store. Therefore,
artifacts (e.g., model) are stored locally within a container. Once
training is over, then the artifacts get lost. With support for
minio (an open-source object store), now artifacts are saved in an
object store. Also, minio's endpoint is exposed via ingress so that
workers outside a cluster can save artifacts. Note that tls for minio
is not enabled because selfsigned cert used in fiab is likely to make
mlflow's client complain about a verification issue.

A caveat is that no persistent volume is configured in this minio
setup, artifacts will get lost if minio pod crashes or terminates. The
support for minio is for reference so that users can configure minio
or aws s3 correctly for production.

In fiab, mlflow is configured without an object store. Therefore,
artifacts (e.g., model) are stored locally within a container. Once
training is over, then the artifacts get lost. With support for
minio (an open-source object store), now artifacts are saved in an
object store. Also, minio's endpoint is exposed via ingress so that
workers outside a cluster can save artifacts. Note that tls for minio
is not enabled because selfsigned cert used in fiab is likely to make
mlflow's client complain about a verification issue.

A caveat is that no persistent volume is configured in this minio
setup, artifacts will get lost if minio pod crashes or terminates. The
support for minio is for reference so that users can configure minio
or aws s3 correctly for production.
@myungjin myungjin linked an issue Jun 30, 2022 that may be closed by this pull request
@codecov-commenter
Copy link

Codecov Report

Merging #163 (2a12fe2) into main (77ed492) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #163   +/-   ##
=======================================
  Coverage   21.29%   21.29%           
=======================================
  Files          34       34           
  Lines        1503     1503           
=======================================
  Hits          320      320           
  Misses       1173     1173           
  Partials       10       10           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77ed492...2a12fe2. Read the comment docs.

Copy link
Collaborator

@GaoxiangLuo GaoxiangLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a new pod named minio initialized. While running with MNIST example, the artifact full path starts with s3://mlruns/XX, which is as expected. If I login to the minio pod, where are the artifacts stored? Also, are we supposed to be able to open http://minio.flame.test?

@myungjin
Copy link
Contributor Author

@GaoxiangLuo you should be able to download artifacts from mlflow UI.

@GaoxiangLuo
Copy link
Collaborator

@GaoxiangLuo GaoxiangLuo reopened this Jun 30, 2022
Copy link
Collaborator

@GaoxiangLuo GaoxiangLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@myungjin myungjin merged commit 28ffca1 into cisco-open:main Jul 1, 2022
@myungjin myungjin deleted the minio_support branch July 1, 2022 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] object storage support for mlflow
3 participants