Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default Model.onnx_model to target onnx.model.tar.gz #355

Merged
merged 8 commits into from
Aug 15, 2023
Merged

Conversation

bfineran
Copy link
Contributor

currently, sparsezoo does not have great support for onnx models saved with external data out of the box. especially for user friendly and deepsparse compatible flows such as Model.onnx_model.path.

Better support will be added in a future backend update - to fix this now, this PR updates the default target for Model.onnx_model to be the model.onnx.tar.gz where possible. A helper class is added to enforce that when a user tries to access the onnx model path, the compressed model is downloaded and unzipped and the reference to the model path points only to the extracted model.onnx file where applicable

note - any users who have already tried to download models with external data will need to clear them from cache

test_plan:
Manually verified with both an external model and a legacy model (see below) - would be good to add these as unit tests, however the download times for the external models can take a while as they are >2Gb. Existing functionality should be thoroughly covered in sparsezoo, deepsparse, and sparseml testing

from sparsezoo import Model
import onnx

large_model_zoo = Model("zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/quantW8A8-none")
small_model_zoo = Model("zoo:cv/classification/vgg-19/pytorch/sparseml/imagenet/pruned-moderate")

large_model_onnx = onnx.load(large_model_zoo.onnx_model.path)
small_model_onnx = onnx.load(small_model_zoo.onnx_model.path)

@bfineran bfineran self-assigned this Aug 11, 2023
Benjamin and others added 5 commits August 11, 2023 13:28
super().path call updated to super.path()
This change fixes failing test
`tests/sparsezoo/objects/test_directory.py ...F....`
…creation

Add model.onnx.tar.gz to expected files
Update test_unzipping to force unzip, as the parent directory is already present
rahul-tuli
rahul-tuli previously approved these changes Aug 15, 2023
@rahul-tuli rahul-tuli self-requested a review August 15, 2023 13:05
@@ -280,16 +280,17 @@ class OnnxGz(Directory):
"""
Special class to handle onnx.model.tar.gz files.
Desired behavior is that all information about files included in the tarball are
available however, when the file.path is accessed, it will point only to the
`model.onnx` as this is the expected behavior for loading, additionally,
available however, when the file's `path` property is accessed, it will point only
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annoying, but apostrophes ' are not well supported by Sphinx, they tend to parse these as unclosed strings

files, directory_class=OnnxGz, display_name="model.onnx.tar.gz"
)
self.onnx_model: File = (
self._file_from_files(files, display_name="model.onnx.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "model.onnx" instead of "model.onnx." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@bfineran
Copy link
Contributor Author

minor updates post approval, merging

@bfineran bfineran merged commit 4577acf into main Aug 15, 2023
4 checks passed
@bfineran bfineran deleted the onnx-gz-default branch August 15, 2023 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants