index.html

<!doctype html>
<html lang="en">

<head>
  <meta charset="utf-8">

  <title>You Are What You Annotate: Towards Better Models through Annotator Representations</title>
  <meta name="description"
        content="Many NLP benchmarks exhibit inherent disagreements. Rather than aggregating labels, we train models directly on datasets with these disagreements. We introduce embedding-based techniques to enhance model performance on such data.">
  <meta name="keywords"
        content="Machine Learning, dataset, classification, NLI, natural language inference, humor, sentiment analysis, emotion classification, hate speech detection, Natural Language Processing, annotation disagreement, research, EMNLP 2023 Findings, EMNLP, Deep Learning, NLP, PyTorch">
  <meta name="author"
        content="Naihao Deng, Siyang Liu, Xinliang Frederick Zhang, Winston Wu, Lu Wang, Rada Mihalcea">

  <meta name="viewport" content="width=device-width, initial-scale=1">

  <meta property="og:type" content="website" />
  <meta property="og:site_name" content="You Are What You Annotate: Towards Better Models through Annotator Representations" />
  <meta property="og:image" content="https://lit.eecs.umich.edu/annotation-embeddings-website/img/example.png" />
  <meta property="og:image:height" content="630" />
  <meta property="og:image:width" content="1200" />
  <meta property="og:title" content="You Are What You Annotate: Towards Better Models through Annotator Representations" />
  <meta property="og:description" content="Many NLP benchmarks exhibit inherent disagreements. Rather than aggregating labels, we train models directly on datasets with these disagreements. We introduce embedding-based techniques to enhance model performance on such data." />
  <meta property="og:url" content="https://lit.eecs.umich.edu/annotation-embeddings-website/" />
  <meta name="twitter:card" content="summary_large_image" />
  <meta name="twitter:site" content="@michigan_AI" />
  <meta name="twitter:creator" content="@michigan_AI" />

  <script async src="https://www.googletagmanager.com/gtag/js?id=G-42MFV87X10"></script>
  <script>
    window.dataLayer = window.dataLayer || [];
    function gtag(){dataLayer.push(arguments);}
    gtag('js', new Date());

    gtag('config', 'G-42MFV87X10');
  </script>

  <link rel="stylesheet" type="text/css" href="main.css"/>
</head>

<body>

<div class="container">

  <header>
    <a href="https://lit.eecs.umich.edu/"><img id="arc" src="img/lit-logo.png" alt="LIT lab logo"></a>
    <a href="https://launch.eecs.umich.edu/"><img id="arc" src="img/launch.jpeg" alt="LAUNCH lab logo"></a>

    <a href="https://umich.edu/"><img id="um" src="img/um.png" alt="University of Michigan logo"></a>

    <h1>You Are What You Annotate:<br>Towards Better Models through Annotator Representations</h1>

    <ul id="quick-links">
      <li><a href="https://arxiv.org/pdf/2305.14663.pdf">Paper</a></li>

      <li><a href="https://github.com/MichiganNLP/Annotator-Embeddings">Code</a></li>

      <li><a href="https://huggingface.co/datasets/dnaihao/TID-8">TID-8 dataset</a></li>

      <li><a href="https://github.com/MichiganNLP/Annotator-Embeddings#citation">ACL Anthology page</a></li>

      <li><a href="https://github.com/MichiganNLP/Annotator-Embeddings#citation">BibTeX Citation</a></li>
    </ul>
  </header>

  <section class="section-alt">
    <div class="content">
      <h2>Abstract</h2>

      <p id="abstract">
        Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are multiple reasons for such disagreements, including the subjectivity of the task, difficult cases, unclear guidelines, and so on. Rather than simply aggregating labels to obtain data annotations, we instead try to directly model the diverse perspectives of the annotators, and explicitly account for annotators' idiosyncrasies in the modeling process by creating representations for each annotator (<i>annotator embeddings</i>) and also their annotations (<i>annotation embeddings</i>). 
        In addition, we propose <b>TID-8</b>, <u><b>T</b></u>he <u><b>I</b></u>nherent <u><b>D</b></u>isagreement - <u><b>8</b></u> dataset, a benchmark that consists of eight existing language understanding datasets that have inherent annotator disagreement.
        We test our approach on TID-8 and show that our approach helps models learn significantly better from disagreements on six different datasets in TID-8 while increasing model size by fewer than 1% parameters. 
        By capturing the unique tendencies and subjectivity of individual annotators through embeddings, our representations prime AI models to be inclusive of diverse viewpoints. 
      </p>
    </div>
  </section>

  <section>
    <div class="content">
      <a href="https://arxiv.org/pdf/2305.14663.pdf">
        <ol id="thumbnails">
          <li><img src="img/thumbs/0.png" alt="thumbnail, page 0" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/1.png" alt="thumbnail, page 1" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/2.png" alt="thumbnail, page 2" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/3.png" alt="thumbnail, page 3" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/4.png" alt="thumbnail, page 4" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/5.png" alt="thumbnail, page 5" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/6.png" alt="thumbnail, page 6" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/7.png" alt="thumbnail, page 7" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/8.png" alt="thumbnail, page 8" style="width: 75px; height: 100px;"></li>
          <li><img src="img/thumbs/9.png" alt="thumbnail, page 9" style="width: 75px; height: 100px;"></li>
        </ol>
      </a>
    </div>
  </section>

  <section>
    <div class="content">
      <ol id="authors">
        <li>
          <a href="https://dnaihao.github.io/">
            <div class="author-img-container">
              <img src="img/authors/naihao_deng.jpg" alt="Naihao Deng profile picture">
            </div>
            Naihao Deng
          </a>
        </li>
        <li>
          <a href="https://web.eecs.umich.edu/~xlfzhang/">
            <div class="author-img-container">
              <img src="img/authors/xinliang_frederick_zhang.jpg" alt="Xinliang Frederick Zhang profile picture">
            </div>
            Xinliang Zhang
          </a>
        </li>
        <li>
          <a href="https://scholar.google.com/citations?user=2OjUAPUAAAAJ&hl=zh-CN">
            <div class="author-img-container">
              <img src="img/authors/siyang_liu.png" alt="Siyang Liu profile picture">
            </div>
            Siyang Liu
          </a>
        </li>
        <li>
          <a href="https://wswu.github.io/">
            <div class="author-img-container">
              <img src="img/authors/winston_wu.png" alt="Winston Wu profile picture">
            </div>
            Winston Wu
          </a>
        </li>
        <li>
          <a href="https://web.eecs.umich.edu/~wangluxy/">
            <div class="author-img-container">
              <img src="img/authors/lu_wang.png" alt="Lu Wang profile picture">
            </div>
            Lu Wang
          </a>
        </li>
        <li>
          <a href="https://web.eecs.umich.edu/~mihalcea/">
            <div class="author-img-container">
              <img src="img/authors/rada_mihalcea.png" alt="Rada Mihalcea profile picture">
            </div>
            Rada Mihalcea
          </a>
        </li>
      </ol>
    </div>
  </section>


  <section>
    <div class="content">
      <h2>Downloads</h2>

      <ul id="downloads">
        <li><a href="https://arxiv.org/pdf/2305.14663.pdf"><b>PDF Paper</b></a></li>
        <br>
        <li><a href="https://github.com/MichiganNLP/Annotator-Embeddings"><b>Code</b></a></li>
        <br>
        <a href="https://huggingface.co/datasets/dnaihao/TID-8"><b>TID-8 dataset</b></a>
      </ul>
    </div>
  </section>

  <section class="section-alt">
    <p id="affiliation">
      <a href="https://umich.edu/">
        <img id="um-vertical" alt="University of Michigan" src="img/um-vertical.png">
      </a>
    </p>
  </section>

  <footer>
    <div class="content" class="section-alt">
      <h2>Acknowledgments</h2>
      <p id="acknowledgments-text">
        We thank the anonymous reviewers for their feedback.
        We thank <a href="https://www.linkedin.com/in/zhenjie-sun-945879273/">Zhenjie Sun</a>, <a href="https://ying-hui-he.github.io/">Yinghui He</a>, and <a href="https://www.linkedin.com/in/yufan-wu-a27b6b24b/overlay/contact-info/">Yufan Wu</a> for their help on the data processing part of this project. 
        We also thank members of <a href="https://lit.eecs.umich.edu/people.html">the Language and Information Technologies (LIT) Lab</a> at the University of Michigan for their constructive feedback.
        This project was partially funded by an award from the Templeton Foundation (#62256).
      </p>

      <p>
        Web page inspired by the
        <a href="https://lit.eecs.umich.edu/lifeqa/">LifeQA web page</a>.
      </p>
    </div>
  </footer>

</div>

</body>

</html>

<script type="text/javascript">
  function playEvidence($id,$start,$end){
    const $video = document.getElementById("example-video");
    $video.pause();
    document.getElementById($id).style.color = "rgb(117, 116, 116)";
    
    function checkTime() {
        if ($video.currentTime >= $end) {
           $video.pause();
        } else {
           /* call checkTime every 1/10th second until endTime */
           setTimeout(checkTime, 100);
        }
    }

    $video.focus();
    $video.currentTime = $start;
    setTimeout(function () {
      // to prevent `The play() request was interrupted by a call to pause().`
      $video.play();
      }, 150);
    checkTime();
  }

</script>