Skip to content

4. Training a Model

Ivan Zvonkov edited this page Nov 6, 2023 · 3 revisions

The following instructions explain how to train and evaluate a region specific double headed LSTM crop/non-crop model. Once trained the model can be used to generate a cropland mask for a region of interest.

Prerequisite: Adding labeled data for region of interest

Instructions

1. Specifying ROI Bounding Box

An ROI bounding box is necessary to show the model which region to focus on during training. Specifically, the bounding box makes training data points within the ROI (local points) to be weighted more than data points outside the ROI.

1a. Draw the bounding box in Google Earth Engine: script

Screen Shot 2023-11-06 at 2 06 27 PM

1b. Paste the generated BBox string into src/bboxes (example) and commit the change to Github

image

2. Training the model

Navigate to the GitHub train action click the Run Workflow button. Specify the required arguments:

  • Model name: name of the model following the convention
  • Evaluation dataset(s): name of the dataset(s) (in datasets.py) which contain(s) evaluation points
  • Bounding box name: name of BBox specified in step 1.

Common Optional model args include:

  • --skip_era5: Trains model without the use of ERA5 precipitation and temperature data
  • --start_month November: Trains model using a November-November crop growing season (if not specific February-February is used)

Other arguments can be found in train.py

image

Once the arguments are specified, click Run workflow to being model training.


Alternative: Training using GitHub Command Line

  1. To train a model from the master branch and create a PR with new model: gh workflow run train.yml -f MODEL_NAME=...
  2. To train a model from an existing branch and push new model to branch: gh workflow run train.yml --ref branch-name -f MODEL_NAME=...

Logs of the Train Github Action can be viewed by clicking the link to the run and then the square labeled train, this will show the current status of the run. image

Once at the Train model step, live model training and validation curves can be viewed on Weights and Biases: https://wandb.ai/nasa-harvest/crop-mask

Once the Training Run is complete an automatic Pull Request will be opened with model metrics and wandb logs in data/models.json.

Additional Information about Model

Hannah Kerner, Gabriel Tseng, Inbal Becker-Reshef, Catherine Nakalembe, Brian Barker, Blake Munshell, Madhava Paliyam, and Mehdi Hosseini. 2020. Rapid Response Crop Maps in Data Sparse Regions. KDD ’20: ACMSIGKDD Conference on Knowledge Discovery and Data Mining Workshops, August 22–27, 2020, San Diego, CA. Link

image