-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 7ef2438
Showing
35 changed files
with
4,909 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2021 Ioannis Kakogeorgiou | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,291 @@ | ||
![Marine Debris Archive Logo](./docs/marida_2.jpg) | ||
|
||
MARine Debris Archive (MARIDA) is a marine debris-oriented dataset on Sentinel-2 satellite images. | ||
It also includes various sea features that co-exist. | ||
MARIDA is primarily focused on the weakly supervised pixel-level semantic segmentation task. | ||
This repository hosts the basic tools for the extraction of spectral signatures | ||
as well as the code for the reproduction of the baseline models. | ||
|
||
In order to download MARIDA go to [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5151941.svg)](https://doi.org/10.5281/zenodo.5151941). | ||
|
||
## Contents | ||
|
||
- [Installation](#installation) | ||
- [Installation Requirements](#installation-requirements) | ||
- [Installation Guide](#installation-guide) | ||
- [Getting Started](#getting-started) | ||
- [Dataset Structure](#dataset-structure) | ||
- [Spectral Signatures Extraction](#spectral-signatures-extraction) | ||
- [Weakly Supervised Pixel-Level Semantic Segmentation](#weakly-supervised-pixel-Level-semantic-segmentation) | ||
- [Unet](#unet) | ||
- [Random Forest](#random-forest) | ||
- [Multi-label Classification](#multi-label-classification) | ||
- [ResNet](#resnet) | ||
- [MARIDA - Exploratory Analysis](https://marine-debris.github.io/) | ||
- [Talks and Papers](#talks-and-papers) | ||
|
||
|
||
## Installation | ||
|
||
### Installation Requirements | ||
- python == 3.7.10 | ||
- pytorch == 1.7 | ||
- cudatoolkit == 11.0 (For GPU usage, compute capability >= 3.5) | ||
- gdal == 2.3.3 | ||
- rasterio == 1.0.21 | ||
- scikit-learn == 0.24.2 | ||
- numpy == 1.20.2 | ||
- tensorboard == 1.15 | ||
- torchvision == 0.8.0 | ||
- scikit-image == 0.18.1 | ||
- pandas == 1.2.4 | ||
- pytables == 3.6.1 | ||
- tqdm == 4.59.0 | ||
|
||
|
||
### Installation Guide | ||
|
||
The requirements are easily installed via | ||
[Anaconda](https://www.anaconda.com/distribution/#download-section) (recommended): | ||
```bash | ||
conda env create -f environment.yml | ||
``` | ||
> If the following error occurred: InvalidVersionSpecError: Invalid version spec: =2.7 | ||
> | ||
> Run: conda update conda | ||
After the installation is completed, activate the environment: | ||
```bash | ||
conda activate marida | ||
``` | ||
|
||
## Getting Started | ||
|
||
### Dataset Structure | ||
|
||
In order to train or test the models, download [MARIDA](https://doi.org/10.5281/zenodo.5151941) | ||
and extract it in the `data/` folder. The final structure should be: | ||
|
||
. | ||
├── ... | ||
├── data # Main Dataset folder | ||
│ ├── patches # Folder with patches Structured by Unique Dates and S2 Tiles | ||
│ │ ├── S2_DATE_TILE # Unique Date | ||
│ │ │ ├── S2_DATE_TILE_CROP.tif # Unique 256 x 256 Patch | ||
│ │ │ ├── S2_DATE_TILE_CROP_cl.tif # 256 x 256 Classification Mask for Semantic Segmentation Task | ||
│ │ │ └── S2_DATE_TILE_CROP_conf.tif # 256 x 256 Annotator Confidence Level Mask | ||
│ │ └── ... | ||
│ ├── splits # Train/Val/Test split Folder (train_X.txt, val_X.txt, test_X.txt) | ||
│ └── labels_mapping.txt # Mapping between Unique 256 x 256 Patch and labels for Multi-label Classification Task | ||
|
||
|
||
The mapping in S2_DATA_TILE_CROP_cl between Digital Numbers and Classes is: | ||
|
||
```yaml | ||
1: 'Marine Debris', | ||
2: 'Dense Sargassum', | ||
3: 'Sparse Sargassum', | ||
4: 'Natural Organic Material', | ||
5: 'Ship', | ||
6: 'Clouds', | ||
7: 'Marine Water', | ||
8: 'Sediment-Laden Water', | ||
9: 'Foam', | ||
10: 'Turbid Water', | ||
11: 'Shallow Water', | ||
12: 'Waves', | ||
13: 'Cloud Shadows', | ||
14: 'Wakes', | ||
15: 'Mixed Water' | ||
``` | ||
For the confidence level mask or other usefull mappings go to utils/assets.py | ||
Also, in order to easily visualize the RGB composite of the S2_DATE_TILE_CROP patches via [QGIS](https://qgis.org/en/site/index.html), | ||
you can use the `utils/qgis_color_patch_rgb.qml` file. | ||
|
||
### Spectral Signatures Extraction | ||
|
||
For the extraction of the spectal signature of each annotated pixel and | ||
its storage in a HDF5 Table file (DataFrame-like processing) run the following commands below. | ||
The output `data/dataset.h5` can be used for the spectral analysis of the dataset. | ||
Also, this stage is required for the Random Forest training (press [here](#random-forest)). | ||
Note that this is not required for the Unet training. This procedure lasts approximately ~10 minutes. | ||
|
||
```bash | ||
python utils/spectral_extraction.py | ||
``` | ||
|
||
Finally, in order to load the `dataset.h5` with Pandas, run in a python cell the following: | ||
|
||
```python | ||
import pandas as pd | ||
hdf = pd.HDFStore('./data/dataset.h5', mode = 'r') | ||
df_train = hdf.select('train') | ||
df_val = hdf.select('val') | ||
df_test = hdf.select('test') | ||
hdf.close() | ||
``` | ||
|
||
### Weakly Supervised Pixel-Level Semantic Segmentation | ||
|
||
#### Unet | ||
|
||
**Unet training** | ||
|
||
Spectral Signatures Extraction in not required for this procedure. | ||
For training in the "train" set and evaluation in "val" set with the proposed parameters, run: | ||
|
||
```bash | ||
cd semantic_segmentation/unet | ||
python train.py | ||
``` | ||
|
||
While training, in order to see the loss status and various metrics via tensorboard, run in a different terminal | ||
the following command and then go to `localhost:6006` with your browser: | ||
|
||
```bash | ||
tensorboard --logdir logs/tsboard_segm | ||
``` | ||
|
||
The `train.py` also supports the following argument flags: | ||
|
||
```bash | ||
# Basic parameters | ||
--agg_to_water "Aggregate Mixed Water, Wakes, Cloud Shadows, Waves with Marine Water (True or False)" | ||
--mode "Select between 'train' or 'test'" | ||
--epochs "Number of epochs to run" | ||
--batch "Batch size" | ||
--resume_from_epoch "Load model from previous epoch (To continue the training)" | ||
# Unet | ||
--input_channels "The number of input bands" | ||
--output_channels "The number of output classes" | ||
--hidden_channels "The number of hidden features" | ||
# Optimization | ||
--weight_param "Weighting parameter for Loss Function" | ||
--lr "Learning rate for adam" | ||
--decay "Learning rate decay for adam" | ||
--reduce_lr_on_plateau "Reduce learning rate when val loss no decrease (0 or 1)" | ||
--lr_steps "Specify the steps that the lr will be reduced" | ||
# Evaluation/Checkpointing | ||
--checkpoint_path "The folder to save checkpoints into." | ||
--eval_every "How frequently to run evaluation (epochs)" | ||
# misc | ||
--num_workers "How many cpus for loading data (0 is the main process)" | ||
--pin_memory "Use pinned memory or not" | ||
--prefetch_factor "Number of sample loaded in advance by each worker" | ||
--persistent_workers "This allows to maintain the workers Dataset instances alive" | ||
--tensorboard "Name for tensorboard run" | ||
``` | ||
|
||
**Unet evaluation** | ||
|
||
Run the following commands in order to produce the Confusion Matrix in stdout and `logs/evaluation_unet.log`, | ||
as well as to produce the predicted masks from the test set in `data/predicted_unet/` folder: | ||
|
||
```bash | ||
cd semantic_segmentation/unet | ||
python evaluation.py | ||
``` | ||
|
||
In order to easily visualize the predicted masks via [QGIS](https://qgis.org/en/site/index.html), | ||
you can use the `utils/qgis_color_mask_mapping.qml` file. | ||
|
||
#### Random Forest | ||
|
||
In our baseline setup we trained a random forest classifier on Spectral Signatures, | ||
produced Spectral Indices (SI) and extracted Gray-Level Co-occurrence Matrix (GLCM) texture features. | ||
Thus, this process requires the Spectral Signatures Extraction i.e., the `data/dataset.h5` [file](#spectral-signatures-extraction). Also, it requires the `dataset_si.h5` and `dataset_glcm.h5` for SI and GLCM features, | ||
respectively. | ||
|
||
1) For the extraction of stacked SI patches (in `data/indices/`) run: | ||
|
||
```bash | ||
cd semantic_segmentation/random_forest | ||
python engineering_patches.py | ||
``` | ||
|
||
Then, in order to produce the `dataset_si.h5` run: | ||
|
||
```bash | ||
python utils/spectral_extraction.py --type indices | ||
``` | ||
|
||
2) For the stacked GLCM patches (in `data/texture/`) run (approximately ~ 110 mins): | ||
|
||
```bash | ||
python engineering_patches.py --type texture | ||
``` | ||
|
||
Similarly, in order to produce the `dataset_glcm.h5` run: | ||
|
||
```bash | ||
python utils/spectral_extraction.py --type texture | ||
``` | ||
|
||
|
||
**Random Forest training and evaluation** | ||
|
||
For training in "train" set and final evaluation in "test" set, run the following commands. | ||
Note that the results will appear in stdout and `logs/evaluation_rf.log`, and the predicted | ||
masks in `data/predicted_rf/` folder. | ||
|
||
```bash | ||
cd semantic_segmentation\random_forest | ||
python train_eval.py | ||
``` | ||
|
||
The `train_eval.py` supports the `--agg_to_water` argument for | ||
the aggregation of various classes to form the Water Super Class (The default setup): | ||
|
||
```bash | ||
python train_eval.py --agg_to_water ['"Mixed Water"','"Wakes"','"Cloud Shadows"','"Waves"'] | ||
``` | ||
|
||
### Multi-label Classification | ||
|
||
The weakly-supervised multi-label classification task is an incomplete multi-label | ||
assignment problem. Specifically, the assigned labels are definitely positive (assigned as 1), | ||
while the absent labels (assigned as 0) are not necessarily negative. The assigned labels | ||
per patch can be found in `data/labels_mapping.txt` | ||
|
||
#### ResNet | ||
|
||
**ResNet training** | ||
|
||
For training in "train" set and evaluation in "val" set, run: | ||
|
||
```bash | ||
cd multi-label/resnet | ||
python train.py | ||
``` | ||
|
||
Similarly to U-Net training, you can use tensorboard thought `localhost:6006` | ||
to visualize the training process: | ||
|
||
``` | ||
tensorboard --logdir logs/tsboard_multilabel | ||
``` | ||
|
||
**ResNet evaluation** | ||
|
||
Run the following commands in order to produce the accuracy scores and the Confusion Matrix in stdout | ||
and `logs/evaluation_resnet.log`, as well as to produce the predictions for each patch from the test | ||
set in `data/predicted_labels_mapping.txt`: | ||
|
||
```bash | ||
python evaluation.py | ||
``` | ||
|
||
## Talks and Papers | ||
[Kikaki A, Kakogeorgiou I, Mikeli P, Raitsos DE, Karantzalos K. Detecting and Classifying Marine Plastic Debris from high-resolution multispectral satellite data.](https://doi.org/10.5194/egusphere-egu21-15243) | ||
|
||
## License | ||
This project is licensed under the MIT License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
#folder structure |
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.