Skip to content

04_Creation of `config` files

Daniel Buscombe edited this page Mar 2, 2023 · 2 revisions

Writing the config file is the hardest and most crucial part of the workflow

Configuration or config files are JSON format and are the place where all relevant parameters are set. Mastery of this process is a large part of being able to use Gym.

It is critical that you retain your config file - it is the key to being able to fully reconstruct a model and for future operators to understand the choices that were made in the training and model construction process.

There is one config file that controls the dataset creation process and the model training process. Therefore it contains a lists (in any order) of parameter:variable pairs that are used by either make_nd_dataset.py or train_model.py. You must run both scripts in a typical model build; the first prepares the dataset in a very specific way that the second is expecting to use.

When the model is used in implementation mode, by calling seg_images_in_folder.py, the config file is used again, this time to prepare the imagery into a format that the trained model expects.

The model consists of the config file, the .h5 weights file, and the code that executes the model. Inside this repository, that code is called seg_images_in_folder.py but is just one form of relatively simple implementation of basic trained model or ensemble of trained models. Some implementation choices are provided, such as test-time augmentation (or not), adaptive thresholding versus simple thresholding. Further implementation options may be added in the future. However, implementation is often related to domain or task-specific outcomes and is therefore left to the user to find the best implementation choices for their task. Implementation choices should be evaluated on a hold-out or test set, that is different from the train and validation sets and ideally encompasses wider variability of imagery. In short, implementation is its own art form. Masters of the Doodleverse carry out their own intuititon over how best to implement a model or models to acheive domain-specific goals on custom imagery. We at Doodleverse HQ will continue to provide case study implementations, both here and in resultant published scholarly works.

Sample Config File Name: config.json

Example contents of a config file:

{
    "TARGET_SIZE": [768,768],
    "MODEL": "resunet",
    "NCLASSES": 4,
    "BATCH_SIZE": 7,
    "N_DATA_BANDS": 3,
    "DO_TRAIN": true,
    "PATIENCE": 10,
    "MAX_EPOCHS": 100,
    "VALIDATION_SPLIT": 0.2,
    "FILTERS":8,
    "KERNEL":7,
    "STRIDE":1,
    "LOSS": "dice",
    "DROPOUT":0.1,
    "DROPOUT_CHANGE_PER_LAYER":0.0,
    "DROPOUT_TYPE":"standard",
    "USE_DROPOUT_ON_UPSAMPLING":false,
    "ROOT_STRING": "hatteras_l8_aug_768",
    "FILTER_VALUE": 3,
    "DOPLOT": true,
    "USEMASK": false,
    "RAMPUP_EPOCHS": 10,
    "SUSTAIN_EPOCHS": 0.0,
    "EXP_DECAY": 0.9,
    "START_LR":  1e-7,
    "MIN_LR": 1e-7,
    "MAX_LR": 1e-4,
    "AUG_ROT": 0,
    "AUG_ZOOM": 0.05,
    "AUG_WIDTHSHIFT": 0.05,
    "AUG_HEIGHTSHIFT": 0.05,
    "AUG_HFLIP": false,
    "AUG_VFLIP": false,
    "AUG_LOOPS": 1,
    "AUG_COPIES": 3,
    "TESTTIMEAUG": false,
    "SET_GPU": "0",
    "SET_PCI_BUS_ID": true
    "TESTTIMEAUG": true,
    "WRITE_MODELMETADATA": true,
    "OTSU_THRESHOLD": true,
    "INITIAL_EPOCH": 0,
    "CLEAR_MEMORY": true,
    "LOAD_DATA_WITH_CPU": false
    "REMAP_CLASSES": {"0": 0, "1": 0, "2": 0, "3":1, "4":0, "5":0, "6":0,"7":0,"8":0, "9":0}
  }

Notice the last entry does NOT have a comma. It does not matter what order the variables are specified as, but you must use the names of the variables exactly as is described here. A description of the variables is provided below.

Model Description configs:

  • TARGET_SIZE: list of integer image dimensions to write to the dataset and use to build and use models. This doesn't have to be the sample image dimension (it would typically be significantly smaller due to memory constraints) but it should ideally have the same aspect ratio. The target size must be compatible with the cardinality of the model. Use a TARGET_SIZE that makes sense for your problem, that conforms roughly with the dimensions of the imagery and labels you have for model training, and that fits in available GPU memory. You might be very surprised at the accuracy and utility of models trained with significantly downsized imagery.
  • MODEL : (string) specify which model you want to use, options are "unet","resunet", "simple_unet", "simple_resunet", "segformer", and "satunet".
  • NCLASSES: (integer) number of classes (1 = binary e.g water/no water). For multiclass segmentations, enumerate the number of classes not including a null class. For example, for 4 classes, use NCLASSES=4
  • BATCH_SIZE: (integer) number of images to use in a batch. Typically better to use larger batch sizes but also uses more memory
  • N_DATA_BANDS: (integer) number of input image bands. Typically 3 (for an RGB image, for example) or 4 (e.g. near-IR or DEM, or other relevant raster info you have at coincident resolution and coverage). Currently cannot be more than 4.
  • DO_TRAIN: (bool) true to retrain model from scratch. Otherwise, the program will use existing model weights and evaluate the model based on the validation set

Model Training configs:

  • PATIENCE: (integer) the number of epochs with no improvement in validation loss to wait before exiting model training
  • MAX_EPOCHS: (integer) the maximum number of epochs to train the model over. Early stopping should ensure this maximum is never reached
  • VALIDATION_SPLIT: (float) the proportion of the dataset to use for validation. The rest will be used for model training. Typically in the range 0.5 -- 0.9 for model training on large datasets
  • LOSS: one of cat (categorical cross-entropy), dice (Dice loss), hinge (hinge loss), or kld (Kullback-Leibler divergence)
  • INITIAL_EPOCH: (integer) the starting epoch of the model run. This is usually (and defaults to) 0. It is >0 when training is interrupted at a known epoch, N. Model training can be restarted from epoch N by specifying INITIAL_EPOCH: N
  • LOSS_WEIGHTS: (bool or list) Only applied when Dice loss is used. If true, loss weights are computed and applied per-class based on the validation dataset. If false, no loss weights are computed and applied. Otherwise, LOSS_WEIGHTS may be a list of floats or integers that are relative weightings of each class
  • CLEAR_MEMORY: (bool) If true, will clear the keras session and carry out garbage collection at the end of each training epoch. This is for memory leaks that have occurred on large datasets
  • LOAD_DATA_WITH_CPU: (bool) If true, will use the CPU to do the tf.data pipeline, which may also help memory leaks or other memory issues

Model Architecture configs:

  • FILTERS : (integer) number of initial filters per convolutional block, doubled every layer
  • KERNEL : (integer) the size of the Conv kernel
  • STRIDE : (integer) the Conv stride
  • DROPOUT : (integer) the fraction of dropout.
  • DROPOUT_CHANGE_PER_LAYER : (integer) changes dropout by addition/ subtraction on encoder/decoder layers
  • DROPOUT_TYPE : (string) "standard" or "spatial"
  • USE_DROPOUT_ON_UPSAMPLING : (bool) if True, dropout is used on upsampling, otherwise it is not

General configs:

  • ROOT_STRING: (string) the prefix used when writing data for use with the model e.g., "coastal_5class_",
  • FILTER_VALUE: (integer) radius of disk used to apply median filter, if > 1
  • DOPLOT: (bool) true to make plots
  • USEMASK: (bool) true if the files use 'mask' instead of 'label' in the folder/filename. if false, 'label' is assumed
  • SET_GPU: (int) for machines with mutiple GPUs, this sets the GPU to use (note that GPU count begins with 0). Use "-1" to specify use of CPU instead of GPU. Use a comma separated string listing multiple GPUs for multi-GPU training, e.g. '0,1' (2 GPUs) or '0,1,2' (3 GPUs)
  • SET_PCI_BUS_ID: (bool) true to reorder the GPUs listed by keras to the same order as the PCI slots on the motherboard. Defaults to true which should be fine for most users. Use false to troubleshoot GPUs

Learning rate scheduler configs (used in `train_model.py):

The model training script uses a learning rate scheduler to cycle through a range of learning rates at every training epoch using a prescribed function. Model training can sometimes be sensitive to the specification of these parameters, especially the MAX_LR, so be prepared to try a few values if the model is not performing optimally

  • RAMPUP_EPOCHS: (integer) The number of epochs to increase from START_LR to MAX_LR
  • SUSTAIN_EPOCHS: (float) The number of epochs to remain at MAX_LR
  • EXP_DECAY: (float) The rate of decay in the learning rate from MAX_LR
  • START_LR: (float) The starting learning rate
  • MIN_LR: (float) The minimum learning rate, usually equals START_LR, must be < MAX_LR
  • MAX_LR: (float) The maximum learning rate, must be > MIN_LR

Dataset creation and Image augmentation configs (used in `make_nd_dataset.py):

This program is structured to carry out the augmentation of labeled training/validation datasets. The program make_dataset.py first generates a new set of augmented imagery and encodes those data (only) into datasets. The model, therefore, is trained using the augmented data only; they are split into train and validation subsets. The original imagery is, therefore, free to be used as a 'hold-out' test set to further evaluate the performance of the model. Augmentation is designed to regularize the model (i.e. prevent it from overfitting) by transforming imagery and label pairs in random ways within limits. Those limits are set using the parameters below. (NOTE: if you want to use non-augmented imagery in Gym, one easy way to do this is to set all augmentations to 0 (AUG_ROT, AUG_ZOOM,AUG_WIDTHSHIFT ,AUG_HEIGHTSHIFT, AUG_HFLIP, AUG_VFLIP) and set AUG_COPIES to 1)

  • AUG_ROT: (integer) the maximum amount of random image rotation in degrees, typically <10
  • AUG_ZOOM: (float) the maximum amount of random image zoom as a proportion, typically <.2
  • AUG_WIDTHSHIFT: (float) the maximum amount of random horizontal shift, typically <.2
  • AUG_HEIGHTSHIFT: (float) the maximum amount of random horizontal shift, typically <.2
  • AUG_HFLIP: (bool) true to randomly horizontally flip the image
  • AUG_VFLIP: (bool) true to randomly vertically flip the image
  • AUG_LOOPS: (integer) number of batches to use for augmented imagery generation (>=2)
  • AUG_COPIES: (integer) number of augmented datasets to create. Each dataset will contain the same number of samples as in the original image set, typically 2--10
  • REMAP_CLASSES: (dict; optional) A dictionary of values in the data and what values you'd like to replace them with, for example {"0": 0, "1": 0, "2": 0, "3":1, "4":1} says "recode ones and twos as zeros and threes and fours as ones". Used to reclassify data on the fly without writing new files to disk. NOTE: Gym uses padding while making datasets. When you use "REMAP_CLASSES", you must account for this padding in two ways. First, the padding around the imagery is the "0" class. So a four class problem has class 0 through 4 (where 0 is the padding). Second, you must now include padding as one of the classes, so a "NCLASSES", must increase by one to account for padding.

Prediction configs (used in `seg_images_in_folder.py):

  • TESTTIMEAUG: (bool; optional) when true, implement Test-Time Augmentation, specifically vertical flip, horizontal flip, and vertical+horizontal flip. These three augmentations (plus the normal orientation prediction) all yield softmax scores for each pixel, which are summed before applying argmax to determine the class for each pixel (i.e., 'soft voting').
  • WRITE_MODELMETADATA: (bool; optional) when true, one npz file will be created per image prediction, containing metadata concerning how the model was implemented
  • OTSU_THRESHOLD: (bool; optional) when true, and when NCLASSES=2 (only), uses per-image Otsu threshold method on the softmax scores to create the binary label, instead of the default decision boundary of 0.5