-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HFModelPusher component proposal #174
Changes from all commits
56e5faf
4b5d16c
4f58c4f
7889018
22d60a0
37e8e8d
c7abcb2
b1b79c7
2586813
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
#### SIG TFX-Addons | ||
# Project Proposal | ||
|
||
--- | ||
|
||
**Your name:** Chansung Park | ||
|
||
**Your email:** deep.diver.csp@gmail.com | ||
|
||
**Your company/organization:** Individual ([ML GDE](https://developers.google.com/community/experts/directory/profile/profile-chansung-park)) | ||
|
||
**Project name:** HuggingFace Model Pusher | ||
|
||
## Project Description | ||
HuggingFace Model Pusher(`HFModelPusher`) pushes blessed model to the [HuggingFace Model Hub](https://huggingface.co/models). | ||
|
||
## Project Category | ||
Component | ||
|
||
## Project Use-Case(s) | ||
The HuggingFace Model Hub lets us have [Git-LFS](https://git-lfs.github.com) enabled repositories in public and private modes. Supported models hosted on the HuggingFace Model Hub can be directly loaded/used with APIs provided by [transformers](https://huggingface.co/docs/transformers/index) package. However, it is not limited. We can host arbitrary types of models too. | ||
|
||
HuggingFace Model Hub is easy to manage model versions, especially for those familiar with Git. | ||
|
||
## Project Implementation | ||
HFModelPusher is a class-based TFX component, and it inherits from TFX standard `Pusher` component. | ||
|
||
It takes the following inputs: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be nice to include a README and/or a model_card_metadata config as inputs for additional documentation and discoverability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the suggestion! me and @sayakpaul had the same thought. Specifically, it would be great to upload a model card generated by By the way, your suggestion on the model_card_metadata config sounds good too! But, it has many many information to fill in, so it would be inappropriate for a TFX component to fill in automatically. Do you have any idea how to make things easier for users to use this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to how mct fills information, it may be good to automatically fill some of this by adding Statistics and such as inputs to the component. See the existing ModeCardGenerator component for reference -> https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/tfx/executor.py There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, taking outputs from
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doubt. Even if the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or it would be easier to just put HTML contents generated by MCT into the markdown model card in HuggingFace Mode Repo. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we would want to avoid the code for turning an HTML page into a separate markdown file for the Hugging Face Hub README. IMO, we could develop separate utilities for this purpose:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. Since this component could be over complicated if we include Model Card generation feature, I thought two possible solutions:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Model Card Toolkit can also create markdown or any arbitrary file types since you can have custom templates and filenames, so it wouldn't require converting HTML to markdown. Anyway, I'm in favor of your and @sayakpaul's original idea to upgrade |
||
``` | ||
HFModelPusher( | ||
username: str, | ||
huggingface_access_token: str, | ||
repo_name: Optional[str], | ||
model: Optional[types.Channel] = None, | ||
model_blessing: Optional[types.Channel] = None, | ||
) | ||
``` | ||
- `username` : username of the HuggingFace user (can be an individual user or an organization) | ||
- `hf_access_token` : access token value of the HuggingFace user. | ||
- `repo_name` : the repository name to push the current version of the model to. The default value is same as the TFX pipeline name | ||
- `model` : the model artifact from the upstream TFX component such as `Trainer` | ||
- `model_blessing` : the blessing artifact from the upstream TFX component such as `Evaluator` | ||
|
||
It gives the follwing outputs: | ||
- `pushed` : integer value to denote if the model is pushed or not. This is set to 0 when the input model is not blessed, and it is set to 1 when the model is successfully pushed | ||
- `pushed_version` : string value to indicate the current model version. This is decided by `time.time()` Python built-in function | ||
- `repo_id` : repository ID where the model is pushed to. This follows the format of f"{username}/{repo_name}" | ||
- `branch` : branch name where the model is pushed to. The branch name is automatically assigned to the same value of `pushed_version` | ||
- `commit_id` : the id from the commit history (branch name could be sufficient to retreive a certain version of the model) | ||
- `repo_url` : repository URL. It is something like f"https://huggingface.co/{repo_id}/{branch}" | ||
|
||
The behaviour of the component: | ||
1. It pushes the model when the `model` is blessed, or it pushes the `model` when the `model_blessing` parameter is set to `None`. This behaviour inherits from the standard `Pusher` component | ||
2. Creates HuggingFace Hub Repository object using the `huggingface-hub` package. It will clone one if there is already an existing repository | ||
3. Checks out a new branch with the name as `pushed_version`. Since the model is pushed for experimental purpose, it would be good to track the versions of the model within separate branches (When the model is ready to be open to public, one can manually merge the right version(branch) into the main branch) | ||
4. Copy all the model related files into a temporary directory in a local file system. All the model related files produced by the upstream component such as `Trainer`. They could be stored in GCS bucket, so `tf.io.gfile` module is a good choice since it handles files in location agnostic manner (GCS or local) | ||
5. Add & commit the current status | ||
6. Pushes the commit to the remote HuggingFace Model Repository | ||
|
||
|
||
## Project Dependencies | ||
- [tfx](https://pypi.org/project/tfx/) | ||
- [huggingface-hub](https://pypi.org/project/huggingface-hub/) | ||
|
||
## Project Team | ||
- Chansung Park, @deep-diver, deep.diver.csp@gmail.com | ||
- Sayak Paul, @sayakpaul, spsayakpaul@gmail.com | ||
|
||
# Note | ||
Please be aware of the processes and requirements which are outlined here: | ||
|
||
* [SIG-TFX-Addons](https://github.com/tensorflow/tfx-addons) | ||
* [Contributing Guidelines](https://github.com/tensorflow/tfx-addons/blob/main/CONTRIBUTING.md) | ||
* [TensorFlow Code of Conduct](https://github.com/tensorflow/tfx-addons/blob/main/CODE_OF_CONDUCT.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may not necessarily need to inherit either btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, i agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not need to. But having it inherited from
Pusher
will be beneficial. I guess that's what @deep-diver meant.