Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By default avoiding generating files in temp directory #1609

Merged
merged 26 commits into from
Jun 19, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Jun 6, 2023

Changes

  1. Changed the conftests so that after running any test target, the total size of files created after running the test suite can be: zero mb for the root directory, and up to 1mb for the temp directory.
  2. Audited the most important paths for /src and /tests
  3. For onnx and python tests reduced the number of data created in the /tmp from 200mb and 800mb respectively down to practically zero mb.

Audit Plan

Auditing the pathways that contain keywords: tempfile and download to make sure that we are not being sloppy with saving or downloading files to the temporary directory.

Auditing the /src

find . -name "*.py" -type f -exec grep -l "download" {} +

./sparseml/onnx/benchmark/info.py - validated
./sparseml/transformers/masked_language_modeling.py - validated
./sparseml/transformers/text_classification.py - validated
./sparseml/transformers/question_answering.py - validated
./sparseml/transformers/token_classification.py - validated
./sparseml/tensorflow_v1/datasets/classification/imagenette.py - validated
./sparseml/tensorflow_v1/datasets/classification/cifar.py - validated
./sparseml/optim/helpers.py - validated
./sparseml/openpifpaf/train.py - validated
./sparseml/pytorch/datasets/detection/coco.py - validated
./sparseml/pytorch/datasets/detection/voc.py - validated
./sparseml/pytorch/datasets/classification/mnist.py - validated
./sparseml/pytorch/datasets/classification/imagenette.py - validated
./sparseml/pytorch/datasets/classification/cifar.py - validated
./sparseml/pytorch/models/external/torchvision.py - validated
./sparseml/pytorch/models/registry.py - validated 
./sparseml/pytorch/torchvision/export_onnx.py - validated
./sparseml/pytorch/torchvision/train.py - validated
./sparseml/pytorch/image_classification/lr_analysis.py - validated
./sparseml/pytorch/image_classification/pr_sensitivity.py - validated
./sparseml/pytorch/image_classification/train.py - validated
./sparseml/pytorch/image_classification/export.py - validated
./sparseml/pytorch/image_classification/utils/helpers.py - validated
./sparseml/pytorch/image_classification/utils/trainer.py - validated
./sparseml/pytorch/utils/helpers.py - validated
./sparseml/pytorch/utils/model.py - validated
./sparseml/benchmark/info.py - validated
./sparseml/yolov8/trainers.py - **validated (allow temp files to be saved (it makes sense))**
./sparseml/keras/datasets/classification/imagenette.py - validated
./sparseml/keras/models/registry.py - validated
./sparseml/utils/datasets/imagenette.py - validated
./sparseml/yolact/scripts.py - validated
./sparseml/yolact/__init__.py - validated
find . -name "*.py" -type f -exec grep -l "tempfile" {} +

./sparseml/onnx/utils/model.py - **validated (I think it should be removed, outdated piece of code))**

Auditing the /tests

Note: I consider saving to tempfile benign if it is either done within a context (using a context manager), or using the tmp_path fixture or factory within the pytest framework.

 find . -name "*.py" -type f -exec grep -l "tempfile" {} +

./integrations/transformers/test_transformers.py - validated 
./integrations/image_classification/test_image_classification.py - validated
./integrations/yolov5/test_yolov5.py - validated
./sparseml/onnx/optim/quantization/helpers.py - validated
./sparseml/onnx/optim/quantization/test_quantize_model_post_training.py - validated
./sparseml/onnx/benchmark/test_info.py - **validated (not fixed)**
./sparseml/onnx/utils/test_data.py - validated
./sparseml/framework/test_info.py - validated
./sparseml/sparsification/test_info.py - validated
./sparseml/optim/test_analyzer.py - validated
./sparseml/optim/test_sensitivity.py - validated
./sparseml/pytorch/sparsification/test_modifier_thinning.py - validated
./sparseml/pytorch/utils/test_helpers.py - validated
./sparseml/pytorch/utils/test_exporter.py - validated
./sparseml/keras/optim/test_mask_pruning.py - validated
./sparseml/yolov5/helpers.py - validated

ubuntu@quad-mle-1:~/damian/sparseml/tests$ find . -name "*.py" -type f -exec grep -l "download" {} +
./integrations/image_classification/args.py - validated
./integrations/image_classification/test_image_classification.py - validated
./sparseml/onnx/benchmark/test_info.py - validated
./sparseml/transformers/utils/test_helpers.py - validated
./sparseml/pytorch/test_torch_to_onnx_exporter.py - validate
./sparseml/pytorch/image_classification/utils/test_helpers.py - validated
./sparseml/yolov5/helpers.py - validated

@dbogunowicz dbogunowicz changed the title Feature/damian/avoid tmp files By default avoiding generating files in temp directory Jun 6, 2023
bfineran
bfineran previously approved these changes Jun 9, 2023
Copy link
Member

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall - should be noted for testing creating tempfiles is ok

@dbogunowicz
Copy link
Contributor Author

@bfineran I think that there is a risk (currently low but will grow over time as we work with larger and larger models) of saving very large files to /tmp. This is something that QA already reported. I think that saving files to /tmp is fine, as long as they also get properly cleaned up. We have been a bit sloppy at concluding our unit tests with appropriate cleanups in the past; this PR is a partial remedy to that sloppiness (fun fact, the test that was writing the biggest amount of mbytes to /tmp and did not clean them up was written by the user d.bogunowicz ;))

KSGulin
KSGulin previously approved these changes Jun 12, 2023
Copy link
Contributor

@KSGulin KSGulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cleanup and updated test conditions!

@dbogunowicz dbogunowicz merged commit c255e81 into main Jun 19, 2023
10 checks passed
@dbogunowicz dbogunowicz deleted the feature/damian/avoid_tmp_files branch June 19, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants