HFModelPusher component proposal #174

deep-diver · 2022-08-23T04:10:44Z

Wrote a HFModelPusher component proposal (cc: @sayakpaul as the co-developer)

github-actions · 2022-08-23T04:10:58Z

Thanks for the PR! 🚀

Instructions: Approve using /lgtm and mark for automatic merge by using /merge.

proposals/20220823-huggingface_model_pusher.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

deep-diver · 2022-08-23T07:23:39Z

addressed the comments.

need reviews from @casassg, @rcrowe-google

sayakpaul

LGTM.

codesue · 2022-08-23T15:31:46Z

proposals/20220823-huggingface_model_pusher.md

+## Project Implementation
+HFModelPusher is a class-based TFX component, and it inherits from TFX standard `Pusher` component.
+
+It takes the following inputs:


It might be nice to include a README and/or a model_card_metadata config as inputs for additional documentation and discoverability.

@codesue

Thank you for the suggestion! me and @sayakpaul had the same thought.

Specifically, it would be great to upload a model card generated by Evaluator and model-card-toolkit. However the model-card-toolkit is a on-going project to be ported into TFX Add-on, so I thought maybe upgrade HFModelPusher when model-card-toolkit is completed.

By the way, your suggestion on the model_card_metadata config sounds good too! But, it has many many information to fill in, so it would be inappropriate for a TFX component to fill in automatically. Do you have any idea how to make things easier for users to use this?

Similar to how mct fills information, it may be good to automatically fill some of this by adding Statistics and such as inputs to the component. See the existing ModeCardGenerator component for reference -> https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/tfx/executor.py

so, taking outputs from Evaluator and StatisticsGen optionally.

if both Artifacts exist, create a ModelCardToolkit

then fill some information from it

Doubt.

Even if the Evaluator and StatisticsGen output artifacts do not exist we'll create a model card with general info, right?

I think modelcards package is useful for this purpose.

@sayakpaul @codesue @casassg

Or it would be easier to just put HTML contents generated by MCT into the markdown model card in HuggingFace Mode Repo. WDYT?

I think we would want to avoid the code for turning an HTML page into a separate markdown file for the Hugging Face Hub README. IMO, we could develop separate utilities for this purpose:

Have a template for the model card (TFX and HF have their own formats, it seems, but there's overlap in information).

Have a utility that generates HTML page from the populated template text

Have utility that generates a markdown file from the populated template text

Right.

Since this component could be over complicated if we include Model Card generation feature, I thought two possible solutions:

Create another custom TFX evaluation component for HuggingFace Model Card / or HF Model Card Generator

Or just run evaluate inside this component(HFModelPusher) over the test dataset and fill evaluation results into the Model Card

Or leverage the existing Evaluator standard TFX component

Model Card Toolkit can also create markdown or any arbitrary file types since you can have custom templates and filenames, so it wouldn't require converting HTML to markdown. Anyway, I'm in favor of your and @sayakpaul's original idea to upgrade HFModelPusher to include a model card later in order to prevent increasing scope and to avoid adding dependencies and unknowns.

casassg

@rcrowe-google is the one who needs to validate from Google side as well

casassg · 2022-08-23T16:24:00Z

proposals/20220823-huggingface_model_pusher.md

+HuggingFace Model Hub is easy to manage model versions, especially for those familiar with Git.
+
+## Project Implementation
+HFModelPusher is a class-based TFX component, and it inherits from TFX standard `Pusher` component.


it may not necessarily need to inherit either btw

Right, i agree

It does not need to. But having it inherited from Pusher will be beneficial. I guess that's what @deep-diver meant.

rcrowe-google

/lgtm

sayakpaul · 2022-08-25T00:34:13Z

/lgtm

casassg · 2022-08-25T00:37:29Z

it will need a /merge from Robert as he is the sole owner XD

codesue · 2022-08-25T01:15:14Z

/lgtm

deep-diver · 2022-08-26T02:40:40Z

@casassg

Once this PR is merged, I will make another PR to update this proposal to include HuggingFace Space Deployment feature. Or should I do it now? I want to do it on a separate PR to clearly discuss different concerns.

casassg · 2022-08-26T23:34:38Z

I think that's fine. I can't merge this due to Robert being owner of the proposals himself (I could merge it but prefer to let him do this as I dont want to force merge either) You may want to tag him on slack next week to double check.

That said, while merging proposal blocks merging the code, you can start opening PRs w the code itself (as long as we merge those after we merge this we should be fine)

deep-diver · 2022-08-29T02:30:17Z

then I will wait until @rcrowe-google merges this PR :)

casassg · 2022-08-30T18:33:37Z

Given Robert is out for a couple days, and given he approved this. I will manually merge, to help work continue on this

rcrowe-google · 2022-08-31T18:17:48Z

Sorry I missed this, and thanks for merging!

* write HFModelPusher component proposal * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * commit_id in the outputs, more desc on the branching Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* initial move of the existing mct code * Added missing file * Added (unreviewed) MCT example * updated example * extended doc * add census helper files * Create 20220513-PandasTransform.md First draft of project proposal. * Update 20220513-PandasTransform.md Oops, forgot to add the dependencies! * Update 20220513-PandasTransform.md Adding a note about statistics and schema for the changed dataset. * Rcrowe fixing filename (#140) * Create 20220513-pandas_transform.md * Delete 20220513-PandasTransform.md * Update README.md (#142) Fixing broken links on PyPI page. * fix bad link in xgboost_evaluator * Update CODEOWNERS * Cherry pick bot to backport changes (#144) * Cherry pick bot to backport changes * Update RELEASE.md * validate user is release manager * Update examples for feature selection component (#116) * Updated module file for compatibility * Updated module file for compatibility * Update iris_module_file.py * Created Palmer Penguins example using Colaboratory * Created Pima Indians Diabetes example pipeline using Colaboratory * Created Iris example pipeline using Colaboratory * Deleted outdated file + replaced with new examples * Deleted outdated file * Fix bad link in xgboost_evaluator (#145) Commit 93fd006 didn't correctly fix the link. * Increase supported version from TFX <1.8 to <1.9 (#147) * include TFX 1.8 * increase CI * switch cache key to constaint first * remove shared cache * fix exit handler * fix patch issues * reformating yapf * Use feast from master git to avoid pip infinite backtrack (#152) * pin pip * Update ci.yml * use requirements max and min instead * add missing letter * fix using -r option * ammend triggers * add tensorflow to the mix * tf 2.6 * move constraints to version file * pin api-core * add protobuf * protobuf 3.19 * add mlp-sdk * add extras for api-core * add it to requirements instead * move feast dep * use feast repo instead of release * remove mlpipeline-sdk * Updated readme with module file guidelines and example (#151) * bump feast version support (#154) * Add feature_selection to included pkg (#161) * add feature_selection to included pkg * upgrade tfx to avoid beam issue in sklearn example * fix due to upgrade of tfma * fix missing 1 types * format inline * update extractor test * remove the header generator * break down CI by component * use file.json * use python file * use correct base_dir * touch for feast_examplegen * ensure it works fine * run if certain dangerous files change * removed non used files and update CONTRIBUTING * improve a bit how variable is set * revert change in feastexamplegen * remove non used variable in version.py * rename job phase * cancel if new ones * add some extra space * add note on inspiration * PandasTransform - Ready for review (#155) * Initial commit * Fixing lint * More lint * Yet more lint * Oops, whitespace * Unused imports * More cleanup * Misc cleanup * Fixed a couple of things * Fixing astype * Disabling unnecessary-comprehension * Oops, wrong comprehension * Trying to improve version handling * Order of imports * Removing reimport * Adding version check to tests * Missed the beam pipeline on the min * Indents on the test params * Lint wants the else formatted differently * Adding an example and update to pydoc for component * Adding README and release notes * Added beam_pipeline_args, updated README * Trying to fix lint * Adding dependencies * Lint, maybe yapf fixes * Trying to fix formatting * Capping Pandas version, updating CODEOWNERS * Various updates pre-merge * support tfx 1.9 (#162) * small fix for filter projects (#164) * small fix for filter projects * Update ci.yml * Update README to add missing component (#165) * Update README.md * fix pandas_transform * Added release notes for feature selection component (#150) * Update CONTRIBUTING.md * Create README.md Adding a readme to the example * Create requirements.txt Adding requirements.txt to the example. Only referencing the component, since that will keep all other dependencies in one place. * Added tests for feature selection component (#149) * Initial commit with working tests * cleaned and ensured the file is working with pre-commit * Testing checkpoint * Added support for module_file * Improved documentation * Fixed test * Fixed pre-commit errors * Added data files for component_test.py * Added tests for artifact count by type * Fixed minor bug * Added test to check if correct features are being selected * Update dependencies for feature_selection * Update tfx_addons/version.py * Update tfx_addons/version.py * Update tfx_addons/version.py Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> * Improve examples CI to automatically pick up projects (#166) * make ci_examples run only when needed * remove non used init file * only run those examples that have test files * address comments and improve documentation * add concurrency for ci-examples * CI: Run all if event name is push (#167) * if event name is push run all projects * remove non-used RUN_ALL_FILE * run ci-examples if changes in filter_examples * add comment on why env name * add better logging * Move main to 0.4 after 0.3 release * Update RELEASE.md (#170) * update firebase publisher proposal (#171) * update firebase publisher proposal update firebase publisher proposal with more detailed descriptions * remove vague word temporary * replace custom_config with concrete parameters * Update tfx_addons/firebase_publisher/README.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * clarify usage of SavedModel Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * HFModelPusher component proposal (#174) * write HFModelPusher component proposal * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * commit_id in the outputs, more desc on the branching Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * fix: missing option --hook-type (#177) fix: missing option `--hook-type` in "Install pre-commit hooks for push hooks: `pre-commit install --hook-type pre-push`" * Scaling Sampler by using row level probability (#163) * initial probabilistic sampling * using sampled in batches * use toDict * use calculation for sampling by class * implement using sampling at individual level * fixup readme * improve executor docs * use patch to avoid flaky tests * Add FirebasePublisher Implementation (#175) * add empty __init__.py * add component.py * add component spec * add executor.py * unncessary codes from custom_config * initial implementation done * fix typo * re-organize the runner * pre-commit * pre-commit * add docstring for the component * add basic unit tests * add basic unit tests * change private to public * change function call name * pass pre-commit * Update tfx_addons/firebase_publisher/runner.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update tfx_addons/firebase_publisher/component.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update tfx_addons/firebase_publisher/component.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update tfx_addons/firebase_publisher/executor.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update tfx_addons/firebase_publisher/runner.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update tfx_addons/firebase_publisher/runner.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update tfx_addons/firebase_publisher/runner.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * remove mis-placed docstring in the component * fix full-step consistency * rename model_exist -> is_model_present * rename upload_model -> upload_model_to_gcs * more descriptions on update_model * update log message on model_create * fix case inconsistency * fix pre-commit * add copyright * add import module in __init__.py * add firebase_publisher version info * fix typo * remove ValueError in docstring * update docstring * fix: pre-commit * fix: pre-commit * update function name in the test module * add copyrights * reduce version requirement * remove version info in __init__.py * add constraint for firebase in CI Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> * update HFModelPusher proposal * add space_url in the output * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update proposals/20220823-huggingface_model_pusher.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Archive penguins_sklearn example (#184) * Update README.md Archived and pointed to example in main TFX repo. * Update penguin_pipeline_sklearn_gcp.py Commenting out to avoid issues while archived. * Update penguin_pipeline_sklearn_gcp_test.py Commenting out to avoid CI issues while archived. * Update penguin_pipeline_sklearn_local.py Commenting out to avoid CI issues while archived. * Update penguin_pipeline_sklearn_local_e2e_test.py Commenting out to avoid CI issues while archived. * Update penguin_pipeline_sklearn_local.py * Update penguin_utils_sklearn.py Commented out to avoid CI issues while archived * Update sklearn_predict_extractor.py Commented out to avoid CI issues while archived. * Update sklearn_predict_extractor_test.py Commented out to avoid CI issues while archived. * Update requirements.txt Commenting out to avoid CI issues while archived * Update Dockerfile Commenting out to avoid CI issues while archived * Update penguin_utils_sklearn.py Lint * Update sklearn_predict_extractor.py Lint * Update sklearn_predict_extractor.py Lint * Update penguin_pipeline_sklearn_gcp.py Lint * Update penguin_pipeline_sklearn_gcp_test.py Lint * Update penguin_pipeline_sklearn_local.py Lint * Update penguin_pipeline_sklearn_local_e2e_test.py Lint * Update penguin_utils_sklearn.py Lint * Update sklearn_predict_extractor_test.py Lint * Update sklearn_predict_extractor_test.py Lint * Update penguin_utils_sklearn.py Lint * Update sklearn_predict_extractor.py Lint * Update sklearn_predict_extractor_test.py Lint * Update penguin_pipeline_sklearn_gcp_test.py * Update penguin_pipeline_sklearn_local_e2e_test.py Null test to satisfy CI * Update sklearn_predict_extractor_test.py Null test to satisfy CI * Update penguin_pipeline_sklearn_gcp_test.py Lint * Update penguin_pipeline_sklearn_local_e2e_test.py Lint * Update sklearn_predict_extractor_test.py Lint * Update penguin_pipeline_sklearn_local_e2e_test.py Lint * Update penguin_pipeline_sklearn_gcp_test.py Lint * Update penguin_pipeline_sklearn_gcp_test.py Lint * Update penguin_pipeline_sklearn_local_e2e_test.py Lint * Update sklearn_predict_extractor_test.py Lint * remove files and include reference to existing code Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> * remove old data (#185) * PyTorch TFX proposal * clean up of CODEOWNERS (#188) * Support TFX 1.10 (#187) * Initial release candidate testing for tfx==1.10.0rc0 * remove suffux from test_utils * add missing extra line * use tfx 1.10 constraint * add firebase_publisher to readme * Add HF pusher owners (#192) * Add HuggingFace Pusher Implementation (#191) * add hfpusher implementation * update version.py * add test for runner.py * linting * add dependency * add pytest ignore protected access * add one more test * fix: test failure * copy proposa as README * add git-lfs dependency * document about RuntimeError when git-lfs is not installed * add _is_git_lfs_installed * add test cases for executor * change _executor -> exe * yapf * Update tfx_addons/huggingface_pusher/README.md Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> * Update tfx_addons/huggingface_pusher/executor_test.py Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> * add huggingface document reference for git-lfs * space_config example added to component_test * yapf * add optional decrypt features * add decrypt_fn in README Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> * initial move of the existing mct code * Added missing file * Added (unreviewed) MCT example * updated example * extended doc * add census helper files * updated mct version dependency * yapf + isort updates * updated gitignore to ignore notebook generated files * isort updated * removed notebook example generated files * removed model_card_pb2.SensitiveData() and model_card_pb2.ConfidenceInterval() * updated formatting * updated example notebook * updated example notebook for the MCT component * updated example notebook for the MCT component * set the code owners for the MCT component * Update RELEASE.md * Update RELEASE.md * added tfxtest * added tfxtest * updated .gitignore * fix typo in CODEOWNERS * Update .gitignore --------- Co-authored-by: Robert Crowe <robertcrowe@google.com> Co-authored-by: Gerard Casas Saez <gcasassaez@twitter.com> Co-authored-by: Kshitijaa Jaglan <29124655+deutranium@users.noreply.github.com> Co-authored-by: Jeroen Van Goey <jeroen.vangoey@gmail.com> Co-authored-by: Chansung Park <deep.diver.csp@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

write HFModelPusher component proposal

56e5faf

deep-diver requested a review from rcrowe-google as a code owner August 23, 2022 04:10

github-actions bot added the needs-lgtm label Aug 23, 2022

deep-diver mentioned this pull request Aug 23, 2022

Integration with HuggingFace Hub #172

Closed

sayakpaul suggested changes Aug 23, 2022

View reviewed changes

deep-diver and others added 8 commits August 23, 2022 15:34

Update proposals/20220823-huggingface_model_pusher.md

4b5d16c

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update proposals/20220823-huggingface_model_pusher.md

4f58c4f

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update proposals/20220823-huggingface_model_pusher.md

7889018

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update proposals/20220823-huggingface_model_pusher.md

22d60a0

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update proposals/20220823-huggingface_model_pusher.md

37e8e8d

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update proposals/20220823-huggingface_model_pusher.md

c7abcb2

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update proposals/20220823-huggingface_model_pusher.md

b1b79c7

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

commit_id in the outputs, more desc on the branching

2586813

sayakpaul approved these changes Aug 23, 2022

View reviewed changes

codesue reviewed Aug 23, 2022

View reviewed changes

casassg reviewed Aug 23, 2022

View reviewed changes

rcrowe-google approved these changes Aug 24, 2022

View reviewed changes

github-actions bot added lgtm needs-merge and removed needs-lgtm labels Aug 25, 2022

gante mentioned this pull request Aug 25, 2022

HFSpacePusher component proposal #176

Closed

casassg merged commit 8950838 into tensorflow:main Aug 30, 2022

deep-diver mentioned this pull request Sep 1, 2022

(proposal update) Add HuggingFace Space pushing capability to the HFModelPusher #178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HFModelPusher component proposal #174

HFModelPusher component proposal #174

deep-diver commented Aug 23, 2022 •

edited

Loading

github-actions bot commented Aug 23, 2022

deep-diver commented Aug 23, 2022

sayakpaul left a comment

codesue Aug 23, 2022

deep-diver Aug 23, 2022

casassg Aug 23, 2022

deep-diver Aug 23, 2022

sayakpaul Aug 24, 2022

deep-diver Aug 24, 2022

deep-diver Aug 24, 2022

sayakpaul Aug 24, 2022

deep-diver Aug 24, 2022 •

edited

Loading

codesue Aug 25, 2022

casassg left a comment

casassg Aug 23, 2022

deep-diver Aug 23, 2022

sayakpaul Aug 24, 2022

rcrowe-google left a comment

sayakpaul commented Aug 25, 2022

casassg commented Aug 25, 2022

codesue commented Aug 25, 2022

deep-diver commented Aug 26, 2022

casassg commented Aug 26, 2022 •

edited

Loading

deep-diver commented Aug 29, 2022

casassg commented Aug 30, 2022

rcrowe-google commented Aug 31, 2022

HFModelPusher component proposal #174

HFModelPusher component proposal #174

Conversation

deep-diver commented Aug 23, 2022 • edited Loading

github-actions bot commented Aug 23, 2022

deep-diver commented Aug 23, 2022

sayakpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deep-diver Aug 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

casassg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcrowe-google left a comment

Choose a reason for hiding this comment

sayakpaul commented Aug 25, 2022

casassg commented Aug 25, 2022

codesue commented Aug 25, 2022

deep-diver commented Aug 26, 2022

casassg commented Aug 26, 2022 • edited Loading

deep-diver commented Aug 29, 2022

casassg commented Aug 30, 2022

rcrowe-google commented Aug 31, 2022

deep-diver commented Aug 23, 2022 •

edited

Loading

deep-diver Aug 24, 2022 •

edited

Loading

casassg commented Aug 26, 2022 •

edited

Loading