Skip to content

Dataset Details

bbischke edited this page May 1, 2017 · 9 revisions

DIRSM DATASET

Overview

The DIRSM-dataset consists of 5.280 images (dev-set) and 1.320 images (test-set) which have been extracted from the YFCC100M-Dataset. The complete dataset contains only one image per user to avoid a bias towards content from similar/same locations and the actively content-sharing users. In addition to the raw images, we supply participants with additional information.

For each image the following information is provided:

In the file author_acknowledgments.txt, we attribute acknowledgment to all authors/originators of images belonging to this dataset. The file contains for each image_id, detailed image- and author-information as well as the specific CC license.

Ground Truth

The ground truth data of the dataset consists of a class label for each image. The ground truth was generated by human annotators.

We define images showing „unexpected high water levels in industrial, residential, commercial and agricultural areas“ as images providing evidence of a flooding event. Annotators were asked to classify the photos as showing very strong non-evidence of a flooding (score 0), non-evidence of a flooding (score 1), direct evidence of a flooding (score 4), very strong direct evidence of a flooding (score 5), or with “don’t know”/incomplete information answer (score 3). The definition of relevance (among with examples) was available to the annotators in the interface during the entire process. The annotation process was not time restricted. Ground truth was collected from two annotators and the final ground truth label was determined as flooding if both annotators rate the image with 4/5 and as non flooding if both annotators scores are 0/1. Images with a score of 3 are not taken into the dataset.

File Format. Ground truth is provided to participants in a comma-separated text file devset_images_gt.json (UTF-8 encoded) for all images. The first value of each line is the unique image_id followed by the ground truth label separated by a comma. Lines are separated by an end-of-line character (carriage return). An example is presented below:

238473289,1
726183944,0
...
193846823,1

Flickr Metadata

Images in the dev- and test-set are accompanied by a JSON-File {devset/testset}_images_metadata.json (UTF-8 encoded) that contains Flickr specific metadata of the images. The information is structured as follows:

{ "images":[{ "image_id": "12328463323",
              "image_url": "http://www.flickr.com/photos/9752474@N07/12328463323/",
              "image_extension_original": "jpg",
              "date_taken": "2014-01-30 10:18:12.0",
              "date_uploaded": "1391631137",
              "user_nsid": "9752474@N07",
              "user_nickname": "SurferJoe88",
              "title": "Emma Wood State Beach - campsites",
              "description": "Emma Wood State Beach - flooded campsites",
              "user_tags": ["flooding"],
              "license_name": "Attribution-NonCommercial-ShareAlike License",
              "license_url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
              "capture_device": "SAMSUNG PL70 / VLUU PL70 / SAMSUNG SL720",
              "latitude": 34.28715400000000102,
              "longitude": -119.32989299999999844},
              ...
            ]
}

Each entry under the images element contains the metadata information per image. Each entry has the following key-value pairs:

  • image_id is the unique identifier of each photo from Flickr and corresponds to the name of the jpeg file associated to this photo (e.g. the image_id 9067739127 corresponds to the file 9067739127.jpg);
  • image_url is the url link of the photo location from Flickr (please note that by the time you use the dataset some of the photos may not be available anymore at the same location);
  • image_extension_original is the extension of the image;
  • date_taken is the date when the image was taken;
  • date_uploaded is the unix timestamp when the image was uploaded on Flickr;
  • user_nsid is the unique user id from Flickr;
  • user_nickname represents the photo owner’s name;
  • title is a short textual description of the photo provided by the author;
  • description contains a detailed textual description of the photo as provided by the author;
  • user_tags are the tag keywords (space-separated) given by the user;
  • license_name is the Creative Common license of this picture;
  • license_url is the url of the Creative Common license of this picture;
  • capture_device is the device with was used for taking the picture;
  • license_url is the url of the Creative Common license of this picture;
  • latitude contains the latitude information of the position where the picture was taken;
  • longitude contains the longitude information of the position where the picture was taken;

Visual Feature Descriptors

General purpose descriptors. For each photo, we provide some conventional visual descriptors extracted using the LIRE library:

  • Auto color correlogram (acc) is a color-based global feature with 256 dimensions [13];
  • Color and Edge Directivity Descriptor (cedd) is a low-level global feature which incorporates color and texture information in a histogram of 144 dimensions [14];
  • Color Layout (cl) is a MPEG-7 color-based descriptor which represent the spatial layout of color images in a very compact feature with 33 dimensions;
  • Edge histogram (eh) is a MPEG-7 texture-based descriptor which represents the spatial distribution of five types of edges (four directional edges and one non-directional) in an 80 dimensional histogram;
  • Fuzzy Color and Texture Histogram (fcth) is a low-level global feature incorporating color and texture information resulting from the combination of 3 fuzzy systems (192 dimensions) [15];
  • Gabor (gabor) is a texture-based global feature with 60 dimensions;
  • Joint Composite Descriptor (jcd) JCD is a joint descriptor combining CEDD and FCTH in one histogram with 168 dimensions [16];
  • Scalable color (sc) is a MPEG-7 color histogram in the HSV color space encoded by a Haar transform (64 dimensions);
  • Tamura (tamura) is a feature which is based on psychophysical studies of the characterizing elements that are perceived in texture by humans (e.g. Contrast, Coarseness, Linelikeness, Regularity,...)[17];

File format. The descriptors are not normalized and stored as comma-separated values (CSV) in a text file. Each file is named after the feature ID (e.g. jcd.csv). In each line, the first item represents the image_id followed by the descriptor's values. Lines are separated by an end-of-line character (carriage return). An example is presented below:

238473289,0.5,3.0,3.5,0.0,0.0,0.5,2.5,1.5,4.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...
...

FDSI DATASET

Overview

The FDSI-dataset consists of 462 satellite image patches (dev-set) and two test-sets which have been derived from Planet’s 4-band satellites [7]. For each image patch in the dev-set, we provide a segmentation mask of the spatial region which was affected by the flooding.

Ground Truth

For each image patch, a segmentation mask of the flooded area in the scene has been created by human annotators. The filename meets the following convention:

seg_mask_ZZZ.png

where ZZZ corresponds to the image_id of the patch (which is the filename of the patch without extension). The masks have the same height and width as the image patch and only one channel for the indices of the class labels. Each pixel has an integer with 0 for the background class and 1 for the flooded areas class.

Satellite Image Format

The satellite image patches for this task were derived from Planet's full-frame analytic scene products of 4-band satellites. The imagery has a ground-sample distance (GSD) of 3.7m and an orthorectified pixel size of 3m. The data was collected between 01/06/2016 and 01/05/2017. The image patches have the shape of 320 x 320 x 4 and are provided in the GeoTiff format. All image scenes have been projected in the UTM projection using the WGS84 datum (EPSG:3857).

Each image patch contains four channels with the bands of data: red, green, blue, and near infrared. Each of these channels is in 16 bit digital number format, and meets the specification of the Planet four band Analytic Ortho Scene Product. The specific spectral response of the satellites can be found in the Planet documentation.