Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readme about dataset labeling #220

Merged
merged 8 commits into from
Jul 3, 2018
50 changes: 50 additions & 0 deletions LABELING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@

# Labeling Guidelines

Now that the screenshots are available, they need to be labeled. The labeling phase operates on couples of comparable screenshots.

## Images marked as compatible - y
---
#### Couples of images that are clearly compatible.
#### They look the same.
#### firefox\_chrome\_overlay window should nearly overlap them.
---
<p align="center"><img src="labeling_guide/y1.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/y2.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/y3.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/y4.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/y5.png" width=617 height=357></p>

## Bounding boxes marked as incompatible - n
---
#### Couples of images which are not compatible
#### They are different.
#### Mark the parts which are logically different.
> For example, improper loading of images, missing text, different design, different languages, different selections, missing bullets or checkboxes and others are marked incompatible.
---

## Bounding boxes marked as different yet compatible - d
---
#### Couples of images that are compatible, but with content differences.
#### They look different.
#### Mark the parts which are logically the same.
>Since the screenshots are taken at different times in the two browsers, there are differences which are not incompatibilities but are actually due to the different timing. For example, a banner could be showing a different advertisement, a video could be in two different frames, a clock could be showing different time, a captcha could be showing different characters or images, two news could be different, and so on.
---
<p align="center"><img src="labeling_guide/n1.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n2.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n3.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n4.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n5.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n6.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n7.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n8.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n9.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n10.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n11.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n12.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n13.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/n14.png" width=617 height=357></p>
<p align="center"><img src="labeling_guide/d1.png" width=617 height=357></p>


In the training phase, the best case is that we are able to detect between **Y + D and N**. If we are not able to do that, we should at least aim for the relaxed problem of detecting between **Y and D + N**. This is why we have this three labeling system.
34 changes: 1 addition & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,39 +16,7 @@ The crawler repeats the same steps in Firefox and Chrome, generating a set of co
The `data/` directory contains the screenshots generated by the crawler (N.B.: This directory is not present in the repository itself, but it will be created automatically after you setup the project as described in the **Setup** paragraph).

### Labeling

Now that the screenshots are available, they need to be labeled. The labeling phase operates on couples of comparable screenshots.

There are three possible labels:
1. **Y** for couples of images that are clearly compatible;
2. **D** for couples of images that are compatible, but with content differences (e.g. on a news site, two screenshots could be compatible even though they are showing two different news, simply because the news shown depends on the time the screenshot was taken and not on the fact that the browser is different);
3. **N** for couples of images which are not compatible.

Here are some examples of the three labels:

**Y**
<img src="https://user-images.githubusercontent.com/1616846/35619755-4a932132-067f-11e8-8b1c-c2f70a6819f4.png" width=158 /> <img src="https://user-images.githubusercontent.com/1616846/35619749-458ac7b2-067f-11e8-868d-ac6e186dec98.png" width=158 />

**D**
<img src="https://user-images.githubusercontent.com/1616846/35619779-5d39f90a-067f-11e8-9e31-7c793c79f246.png" width=158 /> <img src="https://user-images.githubusercontent.com/1616846/35619800-6f25ff2e-067f-11e8-8792-f1c3d9c875d1.png" width=158 />

**N**
<img src="https://user-images.githubusercontent.com/1616846/35619822-7f65ed22-067f-11e8-9b2b-ea99cfd6f7de.png" width=158 /> <img src="https://user-images.githubusercontent.com/1616846/35619769-5724cafe-067f-11e8-8e6a-00d527ab3581.png" width=158 />

In the training phase, the best case is that we are able to detect between Y+D and N. If we are not able to do that, we should at least aim for the relaxed problem of detecting between Y and D+N. This is why we have this three labeling system.

The labeling technical details are described [in this issue](https://github.com/marco-c/autowebcompat/issues/2).

The bounding-box labeling allows us to store the areas where the incompatibilities lie.

<img src="https://user-images.githubusercontent.com/18056781/39081659-fdd4655e-4562-11e8-86f9-a5fab28634bf.JPG" />

<img src="https://user-images.githubusercontent.com/18056781/41806002-99faae6c-76d1-11e8-9442-aa2c4f5025b5.png" />

- Press 'y' to mark the images as compatible;
- Press 'Enter' to select the regions;
- Click the 'T' button in the top left corner of a boundary box to toggle between classes. Purple corresponds to 'n', yellow corresponds to 'd';
- Press 'Enter' to save changes.
[Labeling Guide](LABELING.md)

### Training

Expand Down
Binary file added labeling_guide/d1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n14.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/n9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labeling_guide/y7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.