Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin MLFlow #945

Merged
merged 12 commits into from
Jun 12, 2024
Merged

plugin MLFlow #945

merged 12 commits into from
Jun 12, 2024

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Jun 10, 2024

The MLFlow plugin for Hamilton includes two sets of features:

  • Save and load machine learning models with the MLFlowModelSaver and MLFlowModelLoader materializers
  • Automatically track data pipeline results in MLFlow with the MLFlowTracker.

Changes

  • hamilton/plugins/mlflow_extensions.py contains the materializers
  • hamilton/plugins/h_mlflow.py contains the tracker
  • examples/mlflow/tutorial.ipynb is a tutorial notebook

How I tested this

  • tests/plugins/test_mlflow_extensions.py tests the materializers
  • currently no tests for the tracker

TODO / ideas

  • display_all_functions() and visualize_execution() from the HamiltonTracker to store in the MLFlow tracking server
  • strong coupling between MLFlow experiments and runs with Hamilton UI projects and runs. The MLFlow UI allows for markdown fields which could contain a link to the Hamilton UI.
  • log "datasets" used for tracking in the MLFlow UI. Accepts a "digest" which is equivalent to our fingerprinting concept
  • log model input signatures
  • add example on how to add hyperparameter tuning with nested runs
  • explore the MLFlow features for LLMs and evaluation

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 70e620b in 1 minute and 16 seconds

More details
  • Looked at 734 lines of code in 7 files
  • Skipped 1 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_q2KEKG2CuxHXzKwa


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

hamilton/plugins/h_mlflow.py Show resolved Hide resolved
hamilton/plugins/mlflow_extensions.py Show resolved Hide resolved
hamilton/plugins/mlflow_extensions.py Outdated Show resolved Hide resolved
@zilto zilto closed this Jun 11, 2024
@zilto zilto reopened this Jun 11, 2024
@zilto zilto marked this pull request as draft June 11, 2024 01:03
@zilto zilto marked this pull request as ready for review June 11, 2024 01:04
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only big one is Any typing. Would be nice to constrain it, but if it's too hard I guess that's the trade-off.

Otherwise minor things.


This pairs nicely with the `HamiltonTracker` and the [Hamilton UI](https://hamilton.dagworks.io/en/latest/hamilton-ui/ui/) which gives you execution observability.

We're looking forward to better link Hamilton "projects" with MLFlow "experiments" and runs from both projects.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We're looking forward to better link Hamilton "projects" with MLFlow "experiments" and runs from both projects.
We're looking forward to the better linking of Hamilton "projects" with MLFlow "experiments" and runs from both projects.

examples/mlflow/README.md Show resolved Hide resolved
examples/mlflow/README.md Outdated Show resolved Hide resolved
hamilton/plugins/mlflow_extensions.py Outdated Show resolved Hide resolved
hamilton/plugins/mlflow_extensions.py Show resolved Hide resolved
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, overall looks good. Only thing I'm iffy about is a lot of non-transparent coupling between the materializer types and MLFlow. Can we add a docs section about this?

Also, can we add this to the docs? We have docstrings + everything. This should at least go in reference

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error -- rerun? MlflowException: Invalid experiment name: Ellipsis. Expects a string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added code snippets at the end for demonstration purposes. I turned them into markdown snippets

run_description: Optional[str] = None,
log_system_metrics: bool = False,
):
"""Configure the MLFlow client and experiment for the lifetime of the tracker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show usage in docstring? This should go in docs.

# special case for matplotlib and plotly
# log materialized figure. Allows great degree of control over rendering format
# and also save interactive plotly visualization as HTML
elif node_tags["hamilton.data_saver.sink"] in ["plt", "plotly"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add todo -- we'll want to add more things (datasets, etc...) that can come from this tag. Wil want to be a dict rather than a if/elif statement

hamilton/plugins/h_mlflow.py Show resolved Hide resolved
model_name: Optional[str] = None
version: Optional[Union[str, int]] = None
version_alias: Optional[str] = None
flavor: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"flavor" isn't a great name -- is there more typing (E.G. a Literal[])?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its just mlflow's name then that makes sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the MLFlow terminology, didn't reinvent the wheel here

Copy link
Collaborator Author

@zilto zilto Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each flavor is a library / backend and each have multiple types. We shouldn't explicitly handle them and let that be delegated to MLFlow

image

Copy link
Contributor

ellipsis-dev bot commented Jun 11, 2024

Skipped PR review on f73d81e because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.


Generated with ❤️ by ellipsis.dev

ellipsis-dev[bot]

This comment was marked as resolved.

ellipsis-dev[bot]

This comment was marked as resolved.

@zilto zilto mentioned this pull request Jun 12, 2024
7 tasks
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zilto can you write a nice squash merge commit -- i.e. something to explain any choices/things you didn't add support for but could, etc.

@zilto zilto merged commit 881010e into main Jun 12, 2024
23 checks passed
@zilto zilto deleted the plugin/mlflow branch June 12, 2024 23:44
@zilto zilto mentioned this pull request Jun 12, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants