Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce install size of DIANNA #538

Merged
merged 20 commits into from
Apr 11, 2023
Merged

Reduce install size of DIANNA #538

merged 20 commits into from
Apr 11, 2023

Conversation

stefsmeets
Copy link
Contributor

@stefsmeets stefsmeets commented Mar 28, 2023

In this PR I tried to reduce the install size of dianna. Turns out this is tricky because of hidden dependencies in tensorflow. There are a bunch of changes that I think are a step in the right direction, but I could not remove the dependency on tensorflow/pytorch unfortunately. For example, packages like onnx_tf have undeclared dependencies on tensorflow.

I sorted out the dependencies so that at least dianna can be installed with the least amount of packages. The dashboard and tutorials are now separate "extras" in the pip install (pip install .[tutorials] and pip install .[dashboard].

My main goal is to make the CI a bit more efficient. It seems that this is mainly an issue on linux, which ships with a ton of gpu libraries (~2.8 GB) by default. On windows (~750 MB)/Mac (~850 MB) the install is much more efficient.

I switched out pytorch for the cpu version and tensorflow for tensorflow-cpu (thanks @WillemSpek for the suggestion!). On the CI this makes the installs significantly lighter. The linux cache is down to around 1 GB, and total run time is about 2 minutes. Way better than over 10 to 15 minutes we had a month ago! 🚀

This is also useful locally, to install dianna like so:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install tensorflow-cpu
pip install -e .[dev]

On my linux install this helps reduce my venv size from ~5 GB (😬) to a managable 1 GB (still big, but what can you do 🤷‍♂️).

Closes #510

Todo

  • Test whether cache is smaller (push empty commit)

@stefsmeets stefsmeets changed the title Dianna diet Reduce install size of DIANNA Mar 28, 2023
@stefsmeets stefsmeets marked this pull request as ready for review March 29, 2023 08:10
@stefsmeets
Copy link
Contributor Author

Hi @cwmeijer would you mind reviewing this?

I get these errors on the CI for 2 notebooks and I'm not quite sure why this is happening:

FAILED tutorials/conversion_onnx/keras2onnx.ipynb:: - FileNotFoundError: [Errno 2] No such file or directory: 'mysavedmodel.onnx'
FAILED tutorials/conversion_onnx/tensorflow2onnx.ipynb:: - FileNotFoundError: [Errno 2] No such file or directory: 'mobilenet_graph.onnx'

Any idea?

@cwmeijer
Copy link
Contributor

cwmeijer commented Apr 4, 2023

Hey @stefsmeets,
Neither of those file ring a bell to me. I don't remember seeing them in our repo. We could ask the team today. I also see that this has only been a problem while checking in your last commit (which of course had nothing to do with that). Also, the main branch is passing fine. I'm so puzzled. I pressed 'rerun failed jobs' for now. Maybe it was some fluke.

@stefsmeets
Copy link
Contributor Author

stefsmeets commented Apr 4, 2023

My suspicion is that this fail happens somewhere in the cells that are supposed to write a file. The file does not get written to the expected location, but it does not raise an error so pytest thinks it is OK.

The failing notebooks then depend on those files to exist, but they cannot find it. They have not been generated. I will have a look later today.

@stefsmeets
Copy link
Contributor Author

My bad, I accidentally removed tf2onnx as a dependency 😅
The notebooks spawn a subprocess to call tf2onnx.convert, so this does not actually trigger an exception that pytest can act on.

@stefsmeets
Copy link
Contributor Author

Hi @cwmeijer tests are passing now, do you want to have another look? :-)

Copy link
Contributor

@cwmeijer cwmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Great improvements again!
I added a few comments. Only 1 is asking for an actual change though.

tests/utils.py Show resolved Hide resolved
tests/utils.py Show resolved Hide resolved
Copy link
Contributor

@cwmeijer cwmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha great! Nice solution using the multiline string! Much nicer than it was before even.
I say merge it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Remove dependency on pytorch / tensorflow for default install of DIANNA
2 participants