dianna-exploration/example_data/dataset_preparation/LeafSnap at main · dianna-ai/dianna-exploration

History

Name		Name	Last commit message	Last commit date
parent directory ..
Data_exploration.ipynb		Data_exploration.ipynb
Image_cropping.ipynb		Image_cropping.ipynb
README.md		README.md
Train_test_split.ipynb		Train_test_split.ipynb

README.md

LeafSnap30 dataset generation

The original LeafSnap dataset has been created to facilite the automatic classification of tree species based on the images of their leaves. It has been downloaded from kaggle.com as it was not avaialbe at the original location leafsnap.com at the time. There are 30 866 (~31k) color images of different sizes. The dataset covers all 185 tree species from the Northeastern United States. The original images of leaves taken from two different sources:

"Lab" images, consisting of high-quality images taken of pressed leaves, from the Smithsonian collection.
"Field" images, consisting of "typical" images taken by mobile devices (iPhones mostly) in outdoor environments.

For the purpose of DIANNA a subset of 30 species has been selected, the LeafSnap30 dataset. The 30 most populous in the number of images per species have been chosen resulting in 7395 images divided in 5917 training, 739 validation samples and 739 test samples.

This folder contains 2 notebooks: Data_exploration and Image_cropping.

Data_exploration The purpose of this notebook is to select a subset of the most populous 30 species of lab and field images. Already a dataset of 30 classes have been selected before, where for the lab images have been cropped semi-manually using IrfanView to remove the riles and color calibration image parts. But 2/3 of that dataset has been selected randomly, not according to the number of images in that class.

This notebook is used to explore the original dataset and find out the most polpulous 30 classes and see which have not been included yet in the previous 20-class dataset.

Image_cropping The purpose of this notebook is the croping of the lab images of some species in the LeafSnap30 dataset.
Train_test_split.ipynb The purpose of this notebook is to split the data in a train, test, and validation set. This is done by creating a new folder with symbolic links to the original images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeafSnap

LeafSnap

README.md

LeafSnap30 dataset generation

Files

LeafSnap

Directory actions

More options

Directory actions

More options

Latest commit

History

LeafSnap

Folders and files

parent directory

README.md

LeafSnap30 dataset generation