From 51b5856da7e9dc38d8f37d5f177149082241c1a3 Mon Sep 17 00:00:00 2001 From: sagarvijaygupta Date: Sat, 23 Jun 2018 13:49:50 +0530 Subject: [PATCH] Reference LABELING.md in README.md and move all labeling information in LABELING.md --- LABELING.md | 7 +++++++ README.md | 32 -------------------------------- 2 files changed, 7 insertions(+), 32 deletions(-) diff --git a/LABELING.md b/LABELING.md index f2394ffd..c42e4736 100644 --- a/LABELING.md +++ b/LABELING.md @@ -1,8 +1,11 @@ # Labeling Guidelines +Now that the screenshots are available, they need to be labeled. The labeling phase operates on couples of comparable screenshots. + ## Images marked as compatible - y --- +#### Couples of images that are clearly compatible. #### They look the same. #### firefox\_chrome\_overlay window should nearly overlap them. --- @@ -14,6 +17,7 @@ ## Bounding boxes marked as incompatible - n --- +#### Couples of images which are not compatible #### They are different. #### Mark the parts which are logically different. > Improper loading of images, missing text, different design, different languages are marked incompatible. @@ -21,6 +25,7 @@ ## Bounding boxes marked as different yet compatible - d --- +#### Couples of images that are compatible, but with content differences. #### They look different. #### Mark the parts which are logically the same. >Different advertisements, different videos loaded, time-in-clock are marked @@ -42,3 +47,5 @@ as different yet compatible.

+ +In the training phase, the best case is that we are able to detect between **Y + D and N**. If we are not able to do that, we should at least aim for the relaxed problem of detecting between **Y and D + N**. This is why we have this three labeling system. \ No newline at end of file diff --git a/README.md b/README.md index 3238f802..3e407a51 100644 --- a/README.md +++ b/README.md @@ -17,38 +17,6 @@ The `data/` directory contains the screenshots generated by the crawler (N.B.: T ### Labeling [labeling guide](LABELING.md) -Now that the screenshots are available, they need to be labeled. The labeling phase operates on couples of comparable screenshots. - -There are three possible labels: -1. **Y** for couples of images that are clearly compatible; -2. **D** for couples of images that are compatible, but with content differences (e.g. on a news site, two screenshots could be compatible even though they are showing two different news, simply because the news shown depends on the time the screenshot was taken and not on the fact that the browser is different); -3. **N** for couples of images which are not compatible. - -Here are some examples of the three labels: - -**Y** - - -**D** - - -**N** - - -In the training phase, the best case is that we are able to detect between Y+D and N. If we are not able to do that, we should at least aim for the relaxed problem of detecting between Y and D+N. This is why we have this three labeling system. - -The labeling technical details are described [in this issue](https://github.com/marco-c/autowebcompat/issues/2). - -The bounding-box labeling allows us to store the areas where the incompatibilities lie. - - - - - -- Press 'y' to mark the images as compatible; -- Press 'Enter' to select the regions; -- Click the 'T' button in the top left corner of a boundary box to toggle between classes. Green corresponds to 'n', yellow corresponds to 'd'; -- Press 'Enter' to save changes. ### Training