forked from MichiganNLP/wildqa
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
223 lines (189 loc) · 9.84 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>You Are What You Annotate: Towards Better Models through Annotator Representations</title>
<meta name="description"
content="Many NLP benchmarks exhibit inherent disagreements. Rather than aggregating labels, we train models directly on datasets with these disagreements. We introduce embedding-based techniques to enhance model performance on such data.">
<meta name="keywords"
content="Machine Learning, dataset, classification, NLI, natural language inference, humor, sentiment analysis, emotion classification, hate speech detection, Natural Language Processing, annotation disagreement, research, EMNLP 2023 Findings, EMNLP, Deep Learning, NLP, PyTorch">
<meta name="author"
content="Naihao Deng, Siyang Liu, Xinliang Frederick Zhang, Winston Wu, Lu Wang, Rada Mihalcea">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta property="og:type" content="website" />
<meta property="og:site_name" content="You Are What You Annotate: Towards Better Models through Annotator Representations" />
<meta property="og:image" content="https://lit.eecs.umich.edu/annotation-embeddings-website/img/example.png" />
<meta property="og:image:height" content="630" />
<meta property="og:image:width" content="1200" />
<meta property="og:title" content="You Are What You Annotate: Towards Better Models through Annotator Representations" />
<meta property="og:description" content="Many NLP benchmarks exhibit inherent disagreements. Rather than aggregating labels, we train models directly on datasets with these disagreements. We introduce embedding-based techniques to enhance model performance on such data." />
<meta property="og:url" content="https://lit.eecs.umich.edu/annotation-embeddings-website/" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@michigan_AI" />
<meta name="twitter:creator" content="@michigan_AI" />
<script async src="https://www.googletagmanager.com/gtag/js?id=G-42MFV87X10"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-42MFV87X10');
</script>
<link rel="stylesheet" type="text/css" href="main.css"/>
</head>
<body>
<div class="container">
<header>
<a href="https://lit.eecs.umich.edu/"><img id="arc" src="img/lit-logo.png" alt="LIT lab logo"></a>
<a href="https://launch.eecs.umich.edu/"><img id="arc" src="img/launch.jpeg" alt="LAUNCH lab logo"></a>
<a href="https://umich.edu/"><img id="um" src="img/um.png" alt="University of Michigan logo"></a>
<h1>You Are What You Annotate:<br>Towards Better Models through Annotator Representations</h1>
<ul id="quick-links">
<li><a href="https://arxiv.org/pdf/2305.14663.pdf">Paper</a></li>
<li><a href="https://github.com/MichiganNLP/Annotator-Embeddings">Code</a></li>
<li><a href="https://huggingface.co/datasets/dnaihao/TID-8">TID-8 dataset</a></li>
<li><a href="https://github.com/MichiganNLP/Annotator-Embeddings#citation">ACL Anthology page</a></li>
<li><a href="https://github.com/MichiganNLP/Annotator-Embeddings#citation">BibTeX Citation</a></li>
</ul>
</header>
<section class="section-alt">
<div class="content">
<h2>Abstract</h2>
<p id="abstract">
Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are multiple reasons for such disagreements, including the subjectivity of the task, difficult cases, unclear guidelines, and so on. Rather than simply aggregating labels to obtain data annotations, we instead try to directly model the diverse perspectives of the annotators, and explicitly account for annotators' idiosyncrasies in the modeling process by creating representations for each annotator (<i>annotator embeddings</i>) and also their annotations (<i>annotation embeddings</i>).
In addition, we propose <b>TID-8</b>, <u><b>T</b></u>he <u><b>I</b></u>nherent <u><b>D</b></u>isagreement - <u><b>8</b></u> dataset, a benchmark that consists of eight existing language understanding datasets that have inherent annotator disagreement.
We test our approach on TID-8 and show that our approach helps models learn significantly better from disagreements on six different datasets in TID-8 while increasing model size by fewer than 1% parameters.
By capturing the unique tendencies and subjectivity of individual annotators through embeddings, our representations prime AI models to be inclusive of diverse viewpoints.
</p>
</div>
</section>
<section>
<div class="content">
<a href="https://arxiv.org/pdf/2305.14663.pdf">
<ol id="thumbnails">
<li><img src="img/thumbs/0.png" alt="thumbnail, page 0" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/1.png" alt="thumbnail, page 1" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/2.png" alt="thumbnail, page 2" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/3.png" alt="thumbnail, page 3" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/4.png" alt="thumbnail, page 4" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/5.png" alt="thumbnail, page 5" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/6.png" alt="thumbnail, page 6" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/7.png" alt="thumbnail, page 7" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/8.png" alt="thumbnail, page 8" style="width: 75px; height: 100px;"></li>
<li><img src="img/thumbs/9.png" alt="thumbnail, page 9" style="width: 75px; height: 100px;"></li>
</ol>
</a>
</div>
</section>
<section>
<div class="content">
<ol id="authors">
<li>
<a href="https://dnaihao.github.io/">
<div class="author-img-container">
<img src="img/authors/naihao_deng.jpg" alt="Naihao Deng profile picture">
</div>
Naihao Deng
</a>
</li>
<li>
<a href="https://web.eecs.umich.edu/~xlfzhang/">
<div class="author-img-container">
<img src="img/authors/xinliang_frederick_zhang.jpg" alt="Xinliang Frederick Zhang profile picture">
</div>
Xinliang Zhang
</a>
</li>
<li>
<a href="https://scholar.google.com/citations?user=2OjUAPUAAAAJ&hl=zh-CN">
<div class="author-img-container">
<img src="img/authors/siyang_liu.png" alt="Siyang Liu profile picture">
</div>
Siyang Liu
</a>
</li>
<li>
<a href="https://wswu.github.io/">
<div class="author-img-container">
<img src="img/authors/winston_wu.png" alt="Winston Wu profile picture">
</div>
Winston Wu
</a>
</li>
<li>
<a href="https://web.eecs.umich.edu/~wangluxy/">
<div class="author-img-container">
<img src="img/authors/lu_wang.png" alt="Lu Wang profile picture">
</div>
Lu Wang
</a>
</li>
<li>
<a href="https://web.eecs.umich.edu/~mihalcea/">
<div class="author-img-container">
<img src="img/authors/rada_mihalcea.png" alt="Rada Mihalcea profile picture">
</div>
Rada Mihalcea
</a>
</li>
</ol>
</div>
</section>
<section>
<div class="content">
<h2>Downloads</h2>
<ul id="downloads">
<li><a href="https://arxiv.org/pdf/2305.14663.pdf"><b>PDF Paper</b></a></li>
<br>
<li><a href="https://github.com/MichiganNLP/Annotator-Embeddings"><b>Code</b></a></li>
<br>
<a href="https://huggingface.co/datasets/dnaihao/TID-8"><b>TID-8 dataset</b></a>
</ul>
</div>
</section>
<section class="section-alt">
<p id="affiliation">
<a href="https://umich.edu/">
<img id="um-vertical" alt="University of Michigan" src="img/um-vertical.png">
</a>
</p>
</section>
<footer>
<div class="content" class="section-alt">
<h2>Acknowledgments</h2>
<p id="acknowledgments-text">
We thank the anonymous reviewers for their feedback.
We thank <a href="https://www.linkedin.com/in/zhenjie-sun-945879273/">Zhenjie Sun</a>, <a href="https://ying-hui-he.github.io/">Yinghui He</a>, and <a href="https://www.linkedin.com/in/yufan-wu-a27b6b24b/overlay/contact-info/">Yufan Wu</a> for their help on the data processing part of this project.
We also thank members of <a href="https://lit.eecs.umich.edu/people.html">the Language and Information Technologies (LIT) Lab</a> at the University of Michigan for their constructive feedback.
This project was partially funded by an award from the Templeton Foundation (#62256).
</p>
<p>
Web page inspired by the
<a href="https://lit.eecs.umich.edu/lifeqa/">LifeQA web page</a>.
</p>
</div>
</footer>
</div>
</body>
</html>
<script type="text/javascript">
function playEvidence($id,$start,$end){
const $video = document.getElementById("example-video");
$video.pause();
document.getElementById($id).style.color = "rgb(117, 116, 116)";
function checkTime() {
if ($video.currentTime >= $end) {
$video.pause();
} else {
/* call checkTime every 1/10th second until endTime */
setTimeout(checkTime, 100);
}
}
$video.focus();
$video.currentTime = $start;
setTimeout(function () {
// to prevent `The play() request was interrupted by a call to pause().`
$video.play();
}, 150);
checkTime();
}
</script>