Skip to content

Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans

Notifications You must be signed in to change notification settings

ekalinicheva/multi_layer_vegetation

Repository files navigation

I AM STILL UPDATING THE DATASET, BUT IT IS ALREADY QUITE GOOD AS IT IS! (BUT NOT PERFECT YET)

Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans

This repository contrains the WildForest3D dataset and the code for pipeline presented in the article:

Ekaterina Kalinicheva, Loic Landrieu, Clément Mallet, Nesrine Chehata "Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans", CVPR 2022, Earth Vision Workshop. [arXiv]

THE DESCRIPTION IS NOT FINALIZED YET!

UPDATE (almost done) The whole model have been improved and transformed to the bi-temporal multi-modal model. The new code can be found in the folder multimodal_and_multiseasonal_model. No article will be made for this algorithm, however a brief overview of the model can be found in the README file located in the folder. The whole code is better written than the initial one in this repository and can be equally used for 3D single-data analysis described in the article. (The user can choose in the config file which type of data he wants to process. )

Dataset

The WildForest3D dataset contains 29 plots of dense forest, with 7 million 3D points and 2.1 million individual labels. The study area is located in the heavily forested French region of Aquitaine, and was scanned using a LiDAR installed on unmanned aerial vehicle with an average of 60 pulses per m². Each point is associated with its coordinates in Lambert-93 projection, the intensity value of returned laser signal, and the echo return number. The elevation of the points is given using a digital elevation model, such that the ground points are always at z=0m.

The dataset is located in WildForest3D folder. If you want the code to work properly, your folder structure should be the following. arg.smth is the parameter you have to precise in the configuration file config.py:

├─your_main_data_folder (args.path)
 ├─data_point_clouds (args.folder_clouds)
 │ ├─Placette_1
 │ ├─Placette_2
 │ └─Placette_XX
 │  ├─Pl_XX_final_data_xyzinr.ply
 │  ├─Pl_XX_final_data_xyzinr_winter_nolabels.ply
 │  ├─Pl_XX_trees_params.csv
 │  ├─Pl_XX_trees_bb.csv
 │  └─Pl_XX_trees_bb_faces.ply
 ├─gt_rasters (args.folder_gt_rasters)
 │ ├─Placette_1
 │ ├─Placette_2
 │ └─Placette_XX
 │  ├─Pl_XX_Coverage_canopy_5_classes_05.tif
 │  ├─Pl_XX_Coverage_canopy_6_classes_05.tif
 │  ├─Pl_XX_Coverage_sure_05.tif
 │  └─Pl_XX_Coverage_height_05.tif
 ├─water_raster (args.folder_water)
 │ └─water_clipped.tif
 ├─aerial_images (args.folder_aerial_images)
 │ ├─Placette_1
 │ ├─Placette_2
 │ └─Placette_XX
 │  ├─PL_XX_Aerial_image_clipped.tif
 └─results (args.folder_results)

  • data_point_clouds/ - contains 29 folders Placette_XX/ organized by plot ID, each of those folders contains:
    • Pl_XX_final_data_xyzinr.ply that contains clipped plots with annotated point clouds, the points are annotated in an instance-wise way, so that each point contains the following information : XYZ coordinates, intensity value, number of returns, return number, and the ID of the instance the point belongs to. ID=0 corresponds to a non-annotated point.
    • Pl_XX_trees_params.csv file with the parameters of each individual tree/bush instance. It contains information about each instance (its class name, class category, height, crown base height, etc). The script utils/open_ply_all.py generates the 6 classes dataset that was used in the article. If you want to generate a separate file with the GT, use utils/generate_dataset.py.
    • Pl_XX_trees_bb.csv - coordinates of axis-oriented bounding boxes (BB) for each annotated tree (not used in the article, but might be useful to someone in this world). Each BB is discribed by its XY center, the extent in X and Y axes, and the BB height. Note that we consider that the bottom Y coordinate is always 0.
    • Pl_XX_trees_bb_faces.ply - visualization of the BB from the .csv above.
  • gt_rasters/ - contains 29 folders Placette_XX/ organized by plot ID, each of these folders contains: two GeoTIFF ground truth rasters generated from 3D point clouds (see the article for the details):
    • Pl_XX_Coverage_sure_05.tif (05 stands for the pixel size - 0.5m - though other pixel size rasters can be generated with the code utils/generate_dataset.py) - GT with binary occupancy maps for 3 vegetation layers : ground vegetation, understory, overstory. 1 - vegetataion, 0 - no vegetation, -1 - nodata. Nodata pixels are present, because we only have partial 3D annotation, so its projection on the rasters may create the ambiguity.
    • Pl_XX_Coverage_height_05.tif - the vegetation height of the vegetation-filled pixels by layer : ground vegetation (GV), understory, bottom of overstory and top of overstory. Note that by default, bottoms of GV and understory are 0.
    • Pl_XX_Coverage_canopy_5_classes_05.tif GT with for aerial images (only used for bi-temporal multimodal model) : 0 - ground, 1 - ground vegetation, 2 - understory, 3- deciduous, 4- coniferous, -1 - nodata. Corresponds to 6 classes 3D classification as stem class is not present.
    • Pl_XX_Coverage_canopy_6_classes_05.tif GT with for aerial images (only used for bi-temporal multimodal model) : 0 - ground, 1 - ground vegetation, 2 - understory, 3- oaks, 4- coniferous, 5- alder+others, -1 - nodata. Corresponds to 7 classes 3D classification as stem class is not present.
  • aerial_images/ - (only used for bi-temporal multimodal model) contains 29 folders Placette_XX/ organized by plot ID, each of these folders contains: GeoTIFF VHR aerial image rasters acquired at the same time as 3D summer data (see the article for the details):
    • PL_XX_Aerial_image_clipped.tif - VHR aerial image with 10 cm pixel resulution clipped with the same boundaries as 3D summer data. Contains 6 bands, although only 4 first bands can be used: Blue, Green, Red, NIR.
  • water_raster/ - folder with GeoTIFF raster water_clipped.tif that respresent the distance to the closest water source (rivers). Pixels size 1m. It helps to distinguish the alder class which likes being close to the water. When we generate dataset, we have to precise in the configuration if we want to use this feature. If yes, it is added to each 3D point of the dataset. This is not precised in the article, as we only distinguish coniferous from decidious trees in the initial research. The water feature was added after the article submission.

Species abbreviations

We provide the excel file Abbreviations_species.xlsl in the main directory that contains french and latin names for the categories of species (29 species in total). Nevertheless, at the moment, we distinguish only 2 classes of overstory: coniferous (only pines) and decidious (all other species).

Note that some bushes from the abbreviation table are quite tall and may be higher than 5m.

Model

The full pipeline description can be found in the article. The PointNet++ model can be found in model/model_pointnet2.py and the correponding model parameters in the config.py:

  • subsample_size - the number of points we sample per cylinder (default 16384);
  • smart_sampling - whether we sample those points in a smart way or not (added after submitting the article, so not in there). If smart sampling, all the points with height 0<z<=5m are always chosen, for other points we assign the proability: for z=0m and z>15m the probability is p=0.5, for 5<z<=15m the probability lies between 0.99 and 0.5 (linear);
  • r_num_pts - the number of groups of point sets for each MLP set abstraction level (default [12228, 4098, 1024])
  • rr - the ball radius of the neighbourhood search of point sets for each MLP set abstraction level.
  • ratio - the ratio of reatained neightbourhood points of the ball neighbourhood.
  • drop - the probability value of the DropOut layer.

Results

Our code saves the model and the results computed for test set every 5 epochs (parameter args.n_epoch_test that can be modified). All the results are saved in the folder results/. Each model is saved in the subfolder YYYY-MM-DD_HHMMSS/ which is created automatically once the code is launched. The following output is obtained at the end of each 5th (YY) epoch:

  • Pl_XX_predicted_coverage_ep_YY.ply - classified 3D point cloud (hard class assigment) at YY epoch for XX plot.
  • Pl_XX_predicted_coverage_ep_YY.tif - vegetation layers' occupancy prediction (soft assigment, obtained directly from output logits). Hard classification results are produced in the postprocessing step from 3D predictions.
  • epoch_YY.pt - saved model of YY-th epoch with the corresponding optimizer, scheduler and z_max and dist_max for data normalization.

Using pretreined model

You can use an already pretrained model to continue learning. For this ou should set :

  • args.model_train = False
  • args.trained_ep - the epoch of the pretrained model you want to use
  • args.path_model - the path to the folder with the pretrained model The model optimizer, scheduler, as well as data normalizaion parameters such as z_max and dist_max will be taked from the pretrained model. Otherwise, all the other parameters should be precised in the configuration file. The output will be saved in the results/ folder in the same format as during the normal training.

Inference

If you want to make predictions on some new unlabeled data, simply set:

  • args.model_inference = True
  • args.trained_ep - the epoch of the trained model you want to use
  • args.path_model - the path to the folder with the trained model
  • args.inference_pl - the ids of the datasets you want to classify
  • args.path_inference - the path to the folder with the inference data. Note that the structure of the folder, as well as file names should be the same as data_point_clouds folder. The output will be saved in the results/ folder in the same format as during the normal training.

Postprocessing

Once the model is trained you can either work with the classified test data that is already in the result folder, either you can produce some new results using inference option. The point classification is in Pl_XX_predicted_coverage_ep_YY.ply, therefore you can use code create_rasters/create_mesh_and_rasters.py to generate binary occupancy maps, height maps, and a mesh. Finally, if you want to produce some supplementary statistics on the area of interest, you can use create_rasters/create_different_results.py. It will produce a csv file with different statistics at plot level, plus some supplementary visual results.

Citation

If you are using our code or dataset, please, cite us:

@misc{https://doi.org/10.48550/arxiv.2204.11620,
  doi = {10.48550/ARXIV.2204.11620},
  url = {https://arxiv.org/abs/2204.11620},
  author = {Kalinicheva, Ekaterina and Landrieu, Loic and Mallet, Clément and Chehata, Nesrine},
  title = {Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans},
  publisher = {arXiv},
  year = {2022}
}

About

Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages