Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instable GitHub action cache #1721

Closed
Borda opened this issue May 3, 2020 · 3 comments · Fixed by #1725
Closed

instable GitHub action cache #1721

Borda opened this issue May 3, 2020 · 3 comments · Fixed by #1725
Labels
bug Something isn't working ci Continuous Integration help wanted Open to be worked on

Comments

@Borda
Copy link
Member

Borda commented May 3, 2020

🐛 Bug

there is some issue with GH action and caching as it is randomly failing with using Horovod

To Reproduce

#1709 (comment)

@Borda Borda added bug Something isn't working help wanted Open to be worked on ci Continuous Integration labels May 3, 2020
@tgaddair
Copy link
Contributor

tgaddair commented May 3, 2020

If the goal is to keep the cache to speed things up, my suggestion would be to add a couple checks in the "Install dependencies" step in the workflow:

  1. Check the version of torch before installing requirements.txt, save in Bash variable.
  2. Install requirements.txt
  3. Check the version of torch after, save in Bash variable.
  4. If torch version before != version after, then uninstall Horovod.
  5. Install requirements-extra.txt.

This way, we should be able to leverage the cache to speed things up when nothing changes, without running into incompatibilities when the torch version is upgraded.

I can put together a PR for this.

@Borda
Copy link
Member Author

Borda commented May 3, 2020

@tgaddair PR would be great!
I am a bit suspicions as you mentioned that the cache is not loaded properly so thinking about opening an issue with https://github.com/actions/cache

@Borda
Copy link
Member Author

Borda commented May 4, 2020

I see the issue now, the problem is that the min requirement of horovod is the actual one so the loaded cache even with --upgrade is satisfied
https://github.com/PyTorchLightning/pytorch-lightning/pull/1709/checks?check_run_id=641571597

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci Continuous Integration help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants