Skip to content

robert-s-lee/grid-ray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An example of Grid.ai running Ray in the model. The examples will show how to:

Get started with Development Setup

  • Setup development environment
# Grid.ai minimum is python=3.8
conda create --name ray python=3.8
conda activate ray
# Python modules required
cat >requirements.txt <<EOF
ray
ray[tune]
ray[default]
pandas
tabulate
tensorboardX
EOF
# Install Python modules for the experiment
pip install --ignore-requires-python -v -r requirements.txt
# Install Python modules for the Grid
pip install lightning-grid --upgrade

Unit test by running experiment locally

python ray-tune-quickstart.py

Run on Grid.ai Cloud with zero code modification

  • Login into Grid.ai
grid login
  • Run using default Grid.ai container. Use CLI below or click on Grid.ai Run Badge Single Run
grid run ray-tune-quickstart.py

Advanced Dockerfile usage on Grid.ai

Use Grid.ai with GitHub and Dockerfile examples by using customized container with --dockerfile gridray.dockerfile flag.

  • Run using manually specifying the Dockerfile. Use CLI below.
grid run --dockerfile gridray.dockerfile --name ray-dk-$(date '+%m%d-%H%M%S') ray-tune-quickstart.py
  • Use spot instance and override Run name with ray-MMDD-HHMMSS for easier search later. Use CLI below.
grid run --dockerfile gridray.dockerfile --use_spot --name ray-sp-dk-$(date '+%m%d-%H%M%S') ray-tune-quickstart.py

Use Grid.ai when the model is not on GitHub

Using --localdir does not allow the Grid.ai cloning feature.

  • Let Grid.ai build the container
grid run --name ray-local-$(date '+%m%d-%H%M%S') --localdir ray-tune-quickstart.py
  • Use the container specification
grid run --dockerfile gridray.dockerfile --use_spot --name ray-sp-dk-lc-$(date '+%m%d-%H%M%S') --localdir ray-tune-quickstart.py

Troubleshooting Tips

  • Review grid history
grid history | grep -e Run -e ray -e $(date '+%Y-%m-%d')
┃ Run                              ┃               Created At ┃ Experiments ┃ Failed ┃ Stopped ┃ Completed ┃
│ ray-sp-dk-lc-0720-105956         │ 2021-07-20 15:00:09+0000 │           1 │      0 │       0 │         1 │
│ ray-local-0720-105916            │ 2021-07-20 14:59:30+0000 │           1 │      0 │       0 │         1 │
│ ray-sp-dk-0720-105713            │ 2021-07-20 14:57:25+0000 │           1 │      0 │       0 │         1 │
│ ray-dk-0720-105640               │ 2021-07-20 14:56:53+0000 │           1 │      0 │       0 │         1 │
│ fervent-tamarin-146              │ 2021-07-20 14:55:39+0000 │           1 │      0 │       0 │         1 │
  • Review grid status
for run in $(grid history | grep -e Run -e ray -e $(date '+%Y-%m-%d') | awk -F'' '{print $2}'); do
  echo $run
  grid status $run
done
ray-sp-dk-lc-0720-105956
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Experiment                    ┃                 Command ┃    Status ┃    Duration ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ ray-sp-dk-lc-0720-105956-exp0 │ ray-tune-quickstart.py] │ succeeded │ 0d-00:01:28 │
└───────────────────────────────┴─────────────────────────┴───────────┴─────────────┘

About

Grid Ray examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published