[BugFix] Path not expanded #418

rahul-tuli · 2024-02-01T18:58:20Z

Deployment tar not found bug

When downloading the model, if the python API is used and
a download_path is specified, such that the download path has
the home directory ~ in it; it leads to file not found error
when unzipping tar files.

python local/scripts/deployment_dir_bug.py --small-model 
Downloading (…)training/config.json: 100%|██████████████████████████████████████| 0.98k/0.98k [00:00<00:00, 377kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████| 240/240 [00:00<00:00, 95.5kB/s]
Downloading (…)/training/merges.txt: 100%|███████████████████████████████████████| 446k/446k [00:00<00:00, 8.83MB/s]
Downloading (…)g/model_nocache.onnx: 100%|███████████████████████████████████████| 496M/496M [00:43<00:00, 12.0MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████| 90.0/90.0 [00:00<00:00, 18.9kB/s]
Downloading (…)/training/vocab.json: 100%|███████████████████████████████████████| 779k/779k [00:00<00:00, 10.6MB/s]
Downloading (…)ining/tokenizer.json: 100%|█████████████████████████████████████| 2.02M/2.02M [00:00<00:00, 10.7MB/s]
Downloading (…)el/deployment.tar.gz: 100%|███████████████████████████████████████| 265M/265M [00:23<00:00, 12.0MB/s]
[Errno 2] No such file or directory: '~/test-models/small-model/deployment.tar.gz'
Traceback (most recent call last):
  File "/home/rahul/projects/sparsezoo/src/sparsezoo/objects/directory.py", line 190, in download
    target_directory.unzip()
  File "/home/rahul/projects/sparsezoo/src/sparsezoo/objects/directory.py", line 306, in unzip
    tar = tarfile.open(self._path, "r")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tarfile.py", line 1804, in open
    return func(name, "r", fileobj, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tarfile.py", line 1870, in gzopen
    fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '~/test-models/small-model/deployment.tar.gz'

Trying attempt 1 of 1.
Download retry failed...

Issue

The issue is that the ~ is not expanded to the home directory
when the download path is specified. This is a bug in the
sparsezoo python API.

Test Script

# deployment_dir_bug.py

import argparse
from sparsezoo import Model


def parse_args():
    parser = argparse.ArgumentParser(description="Test Download Bug")
    parser = argparse.ArgumentParser(description='Download models.')
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('--big-model', action='store_true', help='Download big model')
    group.add_argument('--small-model', action='store_true', help='Download small model')
    parser.add_argument('--download-path', type=str, required=False, help='Path to download the model', default=None)
    return parser.parse_args()

def main():
    args = parse_args()
    if args.big_model:
        stub = "zoo:llama2-7b-ultrachat200k_llama2_pretrain-pruned80"
        potential_download_path = "~/test-models/big-model"
    else:
        stub = "zoo:codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized"
        potential_download_path = "~/test-models/small-model"
    
    download_path = args.download_path if args.download_path else potential_download_path
    sparsezoo_model = Model(stub, download_path=download_path)
    downloaded_path = sparsezoo_model.download()
    print(f"Downloaded Model contents to {downloaded_path=}")
    print(f"Sparsezoo Model: {sparsezoo_model=}")
    
        

if __name__ == "__main__":
    main()

Steps to Reproduce

Invoke the script with the --small-model flag, we should see the error.

After this PR

The issue should be resolved and the deployment tar should be found.

python local/scripts/deployment_dir_bug.py --small-model 
Downloading (…)training/config.json: 100%|██████████████████████████████████████| 0.98k/0.98k [00:00<00:00, 382kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████| 240/240 [00:00<00:00, 73.7kB/s]
Downloading (…)/training/merges.txt: 100%|███████████████████████████████████████| 446k/446k [00:00<00:00, 7.24MB/s]
Downloading (…)g/model_nocache.onnx: 100%|███████████████████████████████████████| 496M/496M [00:44<00:00, 11.6MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████| 90.0/90.0 [00:00<00:00, 21.1kB/s]
Downloading (…)/training/vocab.json: 100%|███████████████████████████████████████| 779k/779k [00:00<00:00, 9.05MB/s]
Downloading (…)ining/tokenizer.json: 100%|█████████████████████████████████████| 2.02M/2.02M [00:00<00:00, 10.7MB/s]
Downloading (…)el/deployment.tar.gz: 100%|███████████████████████████████████████| 265M/265M [00:23<00:00, 12.1MB/s]
Downloading (…)small-model/model.md: 100%|██████████████████████████████████████| 0.99k/0.99k [00:00<00:00, 218kB/s]
Downloading (…)el/model.onnx.tar.gz: 100%|███████████████████████████████████████| 264M/264M [00:23<00:00, 11.7MB/s]
Downloaded Model contents to downloaded_path=False
Sparsezoo Model: sparsezoo_model=Model(stub=zoo:codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized)

rahul at office-desktop in ~/projects/sparsezoo (.base_venv) 
$ tree ~/test-models                                                                             (release/1.7|✚1…1)
/home/rahul/test-models
└── small-model
    ├── deployment
    │   ├── config.json
    │   ├── merges.txt
    │   ├── model.onnx
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   ├── tokenizer.json
    │   └── vocab.json
    ├── deployment.tar.gz
    ├── model.md
    ├── model.onnx
    ├── model.onnx.tar.gz
    └── training
        ├── config.json
        ├── merges.txt
        ├── model_nocache.onnx
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        ├── tokenizer.json
        └── vocab.json

3 directories, 18 files

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1206405267063552

* [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * print model-analysis * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * add print statement at the end of cli run --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

* Add analyze to init * Move onnxruntime to deps * Print model analysis (#423) * [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * print model-analysis * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * add print statement at the end of cli run --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> * Omit scalar weight (#424) * ommit scalar weights: * remove unwanted files * comment * Update src/sparsezoo/utils/onnx/analysis.py Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> --------- Co-authored-by: George <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* Add analyze to init * Move onnxruntime to deps * Print model analysis (#423) * [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * print model-analysis * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * add print statement at the end of cli run --------- * Omit scalar weight (#424) * ommit scalar weights: * remove unwanted files * comment * Update src/sparsezoo/utils/onnx/analysis.py --------- --------- Co-authored-by: George <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* `RegistryMixin` improved alias management (#404) * initial commit * add docstrings * simplify * hardening * refactor * format registry lookup strings to be lowercases * standardise aliases * Move evaluator registry (#411) * More control over external data size (#412) * When splitting external data, avoid renaming `model.data` to `model.data.1` if only one external data file gets eventually saved (#414) * [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * Raise TypeError instead of ValueError (#426) * Fix misleading docstring (#416) Add test * add support for benchmark.yaml (#415) * add support for benchmark.yaml recent zoo models use `benchmark.yaml` instead of `benchmarks.yaml`. adding this additional pathway so `benchmark.yaml` is downloaded in the bulk model download * update files filter * fix tests --------- Co-authored-by: dbogunowicz <damian@neuralmagic.com> * [BugFix] Add analyze to init (#421) * Add analyze to init * Move onnxruntime to deps * Print model analysis (#423) * [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * print model-analysis * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * add print statement at the end of cli run --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> * Omit scalar weight (#424) * ommit scalar weights: * remove unwanted files * comment * Update src/sparsezoo/utils/onnx/analysis.py Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> --------- Co-authored-by: George <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> * update analyze help message for correctness (#432) * initial commit (#430) * [sparsezoo.analyze] Fix pathway such that it works for larger models (#437) * fix analyze to work with larger models * update for failing tests; add comments * Update src/sparsezoo/utils/onnx/external_data.py Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> * Delete hehe.py (#439) * Download deployment dir for llms (#435) * Download deployment dir for llms * Use path instead of download * only set save_as_external_data to true if the model originally had external data (#442) * Add Channel Wise Quantization Support (#441) * Chunk download (#429) * chunk download, break down into 10 * lint * threads download * draft * chunk download draft * job based download and combining/deleteing chunks * delete old code * lint * fix num jobs if file_size is less than the chunk size * doc string and return types * test * lint * fix type hints (#445) * fix bug if the value is a dict (#447) * [deepsparse.analyze] Fix v1 functionality to work with llms (#451) * fix equivalent changes made to analyze_v2 such that inference session works for llms; update wanrings to be debug printouts * typo * overwrite file (#450) Co-authored-by: 21 <a21@21s-MacBook-Pro.local> * Adds a `numpy_array_representer` to yaml (#454) on runtime, to avoid serialization issues * Avoid division by zero (#457) Avoid log of zero * op analysis total counts had double sparse counts (#461) * Rename legacy analyze to analyze_v1 (#459) * Fixing Quant % Calcuation (#462) * initial fix * style * Include Sparsity in Size Calculation (#463) * initial fix * style * incorporate sparsity into size calculation * quality * op analysis total counts had double sparse counts (#461) * Fixing Quant % Calcuation (#462) * initial fix * style * Include Sparsity in Size Calculation (#463) * initial fix * style * incorporate sparsity into size calculation * quality * Revert "Merge branch 'main' into analyze_cherry_picks" This reverts commit 509fa1a, reversing changes made to 08f94c4. --------- Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: dbogunowicz <damian@neuralmagic.com> Co-authored-by: George <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom> Co-authored-by: 21 <a21@21s-MacBook-Pro.local>

Resolve Path

ce72eb5

rahul-tuli requested review from Satrat, bfineran, dsikka, horheynm and dbogunowicz February 1, 2024 18:59

rahul-tuli self-assigned this Feb 1, 2024

rahul-tuli added the bug Something isn't working label Feb 1, 2024

rahul-tuli mentioned this pull request Feb 1, 2024

[Cherry pick] path resolution bug #419

Merged

bfineran approved these changes Feb 1, 2024

View reviewed changes

bfineran merged commit 6e0d12b into main Feb 1, 2024
4 checks passed

bfineran deleted the bugfix-resolve-download-path branch February 1, 2024 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Path not expanded #418

[BugFix] Path not expanded #418

rahul-tuli commented Feb 1, 2024 •

edited

Loading

[BugFix] Path not expanded #418

[BugFix] Path not expanded #418

Conversation

rahul-tuli commented Feb 1, 2024 • edited Loading

Deployment tar not found bug

Issue

Test Script

Steps to Reproduce

After this PR

rahul-tuli commented Feb 1, 2024 •

edited

Loading