Skip to content

A neural gym for training deep learning models to carry out geoscientific image segmentation. Works best with labels generated using https://github.com/Doodleverse/dash_doodler

License

Notifications You must be signed in to change notification settings

dbuscombe-usgs/segmentation_gym

 
 

Repository files navigation

📦 Segmentation Gym 💪

Last Commit Maintenance Wiki GitHub Wiki

Python TensorFlow Keras

gym

🌟 Highlights

  • Gym is for training, evaluating, and deploying deep learning models for image segmentation
  • We take transferability seriously; Gym is designed to be a "one stop shop" for image segmentation on N-D imagery (i.e. any number of coincident bands). It is tailored to Earth Observation and aerial remote sensing imagery.
  • Gym encodes relatively powerful models like UNets, and provides lots of ways to manipulate data, model training, and model architectures that should yield good results with some informed experimentation
  • Gym works seamlessly with Doodler, a human-in-the loop labeling tool
  • Gym implements models based on the U-Net. Despite being one of the "original" deep learning segmentation models (dating to 2016), UNets have proven themselves enormously flexible for a wide range of image segmentation tasks and spatial regression tasks in the natural sciences. So, we expect these models, and, perhaps more importantly, the training and implementation of those models in an end-to-end pipeline, to work for a very wide variety of cases. Additional models may be added later.
  • You can read more about the models here but be warned! We at Doodleverse HQ have discovered - often the hard way - that success is more about the data than the model. Zoo helps you wrangle and tame your data, and makes your data work hard for you (nothing fancy, we just use augmentation)

ℹ️ Overview

We are building a toolbox to segment imagery with a variety of supervised deep-learning models for image segmentation. Current work is focused on building a family of UNet models. We have built an end-to-end workflow that facilitates

  • Preprocessing of imagery for deep learning model training and prediction, such as image padding and/or resizing
  • Coupling of N-dimensional imagery, perhaps stored across multiple files, with corresponding integer label images
  • Use of an existing (i.e. pre-trained) model to segment new imagery (by using provided code and model weights)
  • Use of images and corresponding label images, or 'labels', to develop a 'model-ready' dataset. A model-ready dataset is a set of images and corresponding labels in a serial binary archive format (we use .npz) that contain all your data for model training and validation, and that can be unpacked directory as tensorflow tensors. We initially used tfrecord format files, but abandoned the approach because of the relative complexity, and because the npz format is more familiar to Earth scientists who code with python.
  • Training a new model from scratch using this new dataset
  • Evaluating the model against a validation subset
  • Applying the model (or ensemble of models) on sample imagery, i.e. model deployment

We have tested on a variety of Earth and environmental imagery of coastal, river, and other natural environments. However, we expect the toolbox to be useful for all types of imagery when properly applied.

This toolbox is designed to work seamlessly with Doodler, a human-in-the loop labeling tool that will help you make training data for Gym. It would also work on any imagery in jpg or png format that has corresponding 2d greyscale integer label images (jpg or png), however acquired.

✍️ Authors

Package maintainers:

Contributions:

🚀 Usage

This toolbox is designed for 1,3, or 4-band imagery, and supports both binary (one class of interest and a null class) and multiclass (several classes of interest).

We recommend a 6 part workflow:

  1. Download & Install Gym
  2. Decide on which data to use and move them into the appropriate part of the Gym directory structure. (We recommend that you first use the included data as a test of Gym on your machine. After you have confirmed that this works, you can import your own data, or make new data using Doodler)
  3. Write a config file for your data. You will need to make some decisions about the model and hyperparameters.
  4. Run make_dataset.py to augment and package your images into npz files for training the model.
  5. Run train_model.py to train a segmentation model.
  6. Run seg_images_in_folder.py to segment images with your newly trained model, or ensemble_seg_images_in_folder.py to point more than one trained model at the same imagery and ensemble the model outputs
  • Here at Doodleverse HQ we advocate training models on the augmented data encoded in the datasets, so the original data is a hold-out or test set. This is ideal because although the validation dataset (drawn from augmented data) doesn't get used to adjust model weights, it does influence model training by triggering early stopping if validation loss is not improving. Testing on an untransformed set is also a further check/reassurance of model performance and evaluation metric

  • Doodleverse HQ also advocates the use of ensemble models where possible, which requires training multiple models each with a config file, and model weights file

⬇️ Installation

We advise creating a new conda environment to run the program.

  1. Clone the repo:
git clone --depth 1 https://github.com/Doodleverse/segmentation_gym.git

(--depth 1 means "give me only the present code, not the whole history of git commits" - this saves disk space, and time)

  1. Create a conda environment called gym
conda env create --file install/gym.yml
conda activate gym

If you get errors associated with loading the model weights you may need to:

pip install "h5py==2.10.0" --force-reinstall

and just ignore any errors.

Also, tensorflow version 2.2.0 or higher is now required, which means you may need to

pip install tensorflow-gpu=2.2.0 --user

and just ignore any errors. When you run any script, the tensorflow version should be printed to screen.

( add other minimum requirements like Python versions or operating systems)

How to use

Check out the wiki for a guide of how to use Gym

  1. Organize your files according to this guide
  2. Create a configuration file according to this guide
  3. Create a model-ready dataset from your pairs of images and labels. We hope you find this guide helpful
  4. Train and evaluate an image segmentation model according to this guide
  5. Deploy / evaluate model on unseen sample imagery more detail coming soon

Test Dataset

A test data set, including a set of images/labels, model config files, and a dataset and models created with Gym, are available here and described on the zenodo page

💭 Feedback and Contributing

Please read our code of conduct

Please contribute to the Discussions tab - we welcome your ideas and feedback.

We also invite all to open issues for bugs/feature requests using the Issues tab

About

A neural gym for training deep learning models to carry out geoscientific image segmentation. Works best with labels generated using https://github.com/Doodleverse/dash_doodler

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%