Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding CI for checking the build and the built image #10

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tueda
Copy link

@tueda tueda commented Aug 14, 2024

Adding CI for the Docker image will help identify what works and what doesn't. This patch adds a CI workflow, which consists of

  • a job that builds the image and
  • jobs that execute tutorial notebooks using Papermill within Docker containers based on the built image.

The latter uses a separate reusable workflow and a shell script invoked within Docker. Output notebooks and log files (*.log) are saved as artifacts. Intermediate results are also saved, to be used in the subsequent dependent jobs.

Some remarks:

  • Appendix notebooks (a1-a6) do not appear to be well maintained. Moreover, I am not sure about their dependency tree, specifically which notebook requires the output of others (as represented by the needs and outputs parameters of each job). Consequently, they fail to run.
  • Currently, 4a fails but it will be fixed in another PR.
  • The 4a notebook on GitHub does not include the limit by SCANDAL (printed no SCANDAL). So, I do not include 3c as a dependency for 4a, but I guess it is OK to include it to generate the limit.
  • 3a randomly fails due to freezing in training (you may need to re-run failed jobs). A possible cause is num_workers passed in DataLoader, which is by default set to 8, and causes the following warning:
    UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
    
  • I set a timeout of 1 hour for each cell of notebooks (the default timeout parameter of the reusable workflow) to catch the above freezing and to save failed notebooks. Maybe this timeout should be extended if there is a more time-consuming cell.
  • It would be nice to generate Job Summary including plots (as images) extracted from notebooks once the ability to web-access each file in artifacts is implemented in GitHub.

See also a workflow run on my branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant