.
├── configs/
│   ├── configs.json
│   ├── datasets
│   ├── method
│   ├── download
│   └── training
├── datasets/
│   ├── BigEarthNet.py
│   ├── FLAIRDataset.py
│   └──  ...
├── downloading_scripts/
|    ├── bigearthnet.sh
|    ├── treesat.sh
|    └── ...
├── training/
|    └── classification.py
|    └── segmentation.py
└── utilities/
     └── augmentations.py
     └── utils.py
     └── model_utilities.py
     └── webdataset_writer.py
main.py
downloader.py

configs.json contains high-level experimental choices, including the dataset of interest and whether to activate wandb.

the datasets/ directory contains the dataset specific configurations e.g task to solve, metrics to log etc.

Example configurations for the cactus dataset:

{
    "root_path":"dataset_root_path",
    "task":"classification", // Possible Tasks: Depending on the dataset,
    "metrics": ["accuracy","fscore"], //Desired metrics to log
    "num_classes":2,
    "in_channels":3,
    "meta_info":""
}

Similarly, the method and training directories contain the configuration choices for the desired tasks e.g classification, segmentation and training choices e.g batch size, epochs etc respectively.

Downloading Datasets

To download the datasets used in this benchmark select the desired dataset in configs/download/download.json along with the base directory to store the data and execute

python downloader.py.

Make sure to give proper permissions to the scripts under downloading_scripts/ running:

chmod +x download_script.sh

downloader.py will handle the downloading and all necessary restructuring needed for the experiments.

Note for object detection and point cloud datasets: we provide scripts to create tiles or sub-point clouds for the NeonTree, ReforesTree and FORinstance datasets. Please refer to the corresponding scripts in the utilities folder, either for detection or point cloud datasets.

Experiments

All information is aggregated by main.py. Given the aggregated configurations, the appropriate dataloading, training and testing functions are constructed.

For each available task, the data loader follows same/similar patterns, enabling the training/testing procedures to remain (mostly) dataset agnostic.

To run an experiment one has to select the desired dataset in configs/configs.json. The training options can be defined in configs/training/training.json. If a dataset can support multiple tasks, the user can specify the desired task in configs/datasets/[YOUR_DATASET].json.

All datasets support the webdataset format for more efficient data loading. To enable it set webdataset:true in configs/configs.json. If webdataset shards exist, training begins immediately. Otherwise, the shards are created automatically. Depending on the dataset size this process may take from minutes to a few hours.

If data augmentation is needed then set augment:true in configs/training/training.json. The desired data augmentations can be set in configs/augmentations/augmentations.json, along with their strength (probability of occuring) e.g if we want to always resize an image to 224x224 pixels set:

"Resize":{
            "value":224,
            "p":1.0
        },

The current augmentation configuration files contains all supported augmentations.

Webdataset setup

The example in configs/configs.json contains the following options for webdataset:

"webdataset":true,
"webdataset_shuffle_size": 1000,
"webdataset_initial_buffer":1000,
"max_samples_per_shard": 256, //set upper limit 256 samples per shard
"webdataset_root_path": null,

Setting the webdtaset_root_path variable will change the saving directory of the webdataset. If left at null, the webdataset will be saved at the same directory as the dataset of interest. The max_samples_per_shard argument is only used when creating the webdataset and refers to the maximum number of samples that would be contained in a single shard. This is handled by utilities/webdataset_writer.py. webdataset_shuffle_size determines the size of the buffer where the data are sampled, while webdataset_initial_buffer the amount of samples to be loaded before starting to yield.

Supported models

For classification tasks we support all encoders available in timm. In this benchmark we mainly focus on:

Model	Paper
ResNet	ResNet Paper
ViT	ViT Paper
ConvNext	ConvNext Paper

In terms of semantic semgnetation problems we focus on:

Model	Paper
UNet	UNet Paper
UNet++	UNet++ Paper
DeepLabv3plus	DeepLabv3plus Paper
UperNet	UperNet Paper

For object detection we support:

Adding new models

To add support for a new model just include its construction in utilities/model_utilities.py. Depending on the task, the models are constructed in one of the following functions:

create_classifier() // for classification tasks,
create_segmentor() // for semantic segmentation tasks and,
create_detector() // for object detection tasks

Adding new tasks

In case one needs to solve a task not included in this repo, they have to update the pipeline with the following steps:

Create a training/testing procedure in training/ as done in training/classification.py.
Update the create_procedures() function in the utilities/utils.py, to handle the desired task e.g for classification:

if configs['task']=='classification':
    trainer = classification.train
    tester = classification.test

Create a config file specifying the desired model and other hyperparameters needed for the training/testing procedures in configs/method/your_task.json. Examples can be found in configs/method/ e.g configs/method/classification.json.
Update the create_checkpoint_path() function in utilities/utils.py to create a unique checkpoint for the given task given the provided configs for each experiment. E.g for the semantic segmentation task:

 if configs['task']=='segmentation':
        checkpoint_path = (
                Path("checkpoints")
                / configs["task"].lower()
                / configs["dataset"].lower()
                / configs["architecture"].lower()
                / configs["backbone"].lower()
            )

Run your experiments by modifying your task config file configs/method/your_task.json and running python main.py.

Train FoMo-Net

To enable FoMo-Net training set "all" as dataset in configs.json. configs/datasets/all.json provides an example of the needed configurations. Set augmentations to true in configs/training/training.json and the desired augmentations in configs/augmentations/augmentations.json e.g

"RandomResizedCrop": {
            "value": 224,
            "scale":[0.2, 1.0],
            "interpolation":3,
            "p": 1.0
        },

Adding data augmentations

The data augmentation pipeline is based on the Albumentations library. To add an augmentation method one should include it in the get_augmentations function of utilities/augmentations.py. For example to add the VerticalFlip augmentation we add:

elif k == "VerticalFlip":
    aug = A.augmentations.VerticalFlip(p=v["p"])

Adding new datasets

To add a new dataset one has to:

1. Create a configuration file in configs/datasets/ with the name of the dataset in lower case (e.g configs/datasets/flair.json)
    - The configuration file has to include:
        - The root path of the data
        - The nature of the task to solve (e.g classification)
        - The metrics to log in a list (e.g ["accuracy","fscore"])
        - Any other information needed for loading the data (depends on your implementation of the data loader).
2. Create a data loader in datasets/ (e.g datasets/FLAIRDataset.py). Each dataset should include a plot() function for visualization.
3. The data loader should return data in the following form: `sample, label`. If a different scheme is used, the training procedures should be adapted accordingly (e.g classification.py and the webdataset processing pipeline.)
4. Include the option to load the dataset in the load_dataset() function of the utilities/utils.py
    - The following code block shows an example for the FLAIR dataset
5. Include the mean and std of the new dataset to configs/stats/stats.json. These stats can be calculated using the calc_stats.py script. Set the option batched=True to process the dataset in batches.

elif configs['dataset'].lower()=='flair':
    dataset = datasets.FLAIRDataset.FLAIRDataset(configs,mode)

Datasets in the benchmark

The following table presents the datasets supported in this repo, along with some basic information regarding the data sensor, their spatial coverage and the tasks they enable. Each dataset can be used in the respective config files with the following names in lower case.

Dataset	Modalities	Possible Tasks	Covered Areas
Cactus	Aerial RGB	Classification	Mexico
FLAIR	Aerial - RGB, NIR, Elevation	Segmentation	France
FLAIR2	Aerial - RGB, NIR, Elevation, Sentinel-2	France
TreeSatAI	Aerial, Sentinel-1, Sentinel-2	Classification	Germany
Woody	Aerial	Segmentation	Chile
ReforesTree	Aerial	Detection, Regression	Ecuador
ForestNet	Landsat-8	Classification, Segmentation	Indonesia
NeonTree	Satellite RGB, LiDAR, Hyperspectral	Detection	USA
Spekboom	AERIAL	Segmentation	South Africa
Waititu	Aerial	Segmentation	New Zealand
BigEarthNet-MM	Sentinel-1, Sentinel-2	Multi-label Classification	Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland
Sen12MS	Sentinel-1, Sentinel-2	Multi-label Classification	Global
RapidAI4EO	Planet, Sentinel-2	Multi-label Classification	Europe
TalloS	Sentinel-1, Sentinel-2, DEM, ERA-5	Multi-label Classification	Global

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repository contains the code used in FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models (Status: WIP)

Table of Contents

Setup project

Repository Structure

Downloading Datasets

Experiments

Webdataset setup

Supported models

Adding new models

Adding new tasks

Train FoMo-Net

Adding data augmentations

Adding new datasets

Datasets in the benchmark

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
configs		configs
datasets		datasets
downloading_scripts		downloading_scripts
model_zoo		model_zoo
training		training
utilities		utilities
LICENSE		LICENSE
README.md		README.md
downloader.py		downloader.py
main.py		main.py
requirements.txt		requirements.txt

License

RolnickLab/FoMo-Bench

Folders and files

Latest commit

History

Repository files navigation

This repository contains the code used in FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models (Status: WIP)

Table of Contents

Setup project

Repository Structure

Downloading Datasets

Experiments

Webdataset setup

Supported models

Adding new models

Adding new tasks

Train FoMo-Net

Adding data augmentations

Adding new datasets

Datasets in the benchmark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages