Skip to content
@google-research-datasets

Google Research Datasets

Datasets released by Google Research

Pinned Loading

  1. natural-questions natural-questions Public

    Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question ans…

    Python 911 151

  2. conceptual-captions conceptual-captions Public

    Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

    Shell 509 25

  3. Objectron Objectron Public

    Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the came…

    Jupyter Notebook 2.2k 264

  4. wit wit Public

    WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

    982 40

  5. paws paws Public

    This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase ident…

    Python 544 52

  6. dstc8-schema-guided-dialogue dstc8-schema-guided-dialogue Public

    The Schema-Guided Dialogue Dataset

    Python 534 124

Repositories

Showing 10 of 159 repositories
  • google-research-datasets/sanpo_dataset’s past year of commit activity
    Python 39 Apache-2.0 1 3 1 Updated Aug 2, 2024
  • SPICE Public

    SPICE is a stereotype dataset in English containing stereotypes collected in India with community engagement. It spans identity groups and stereotypes unique to India, as well as other stereotypes about gender and nationalities.

    google-research-datasets/SPICE’s past year of commit activity
    2 CC-BY-4.0 0 0 0 Updated Jul 26, 2024
  • cube Public

    CUBE is a benchmark to evaluate the Cultural Competence of T2I models

    google-research-datasets/cube’s past year of commit activity
    4 CC-BY-4.0 0 0 0 Updated Jul 18, 2024
  • screen_qa Public

    ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.

    google-research-datasets/screen_qa’s past year of commit activity
    78 CC-BY-4.0 8 1 0 Updated Jul 18, 2024
  • uicrit Public

    UICrit is a dataset containing human-generated natural language design critiques, corresponding bounding boxes for each critique, and design quality ratings for 1,000 mobile UIs from RICO. This dataset was collected for our UIST '24 paper: https://arxiv.org/abs/2407.08850.

    google-research-datasets/uicrit’s past year of commit activity
    1 0 0 0 Updated Jul 18, 2024
  • visage Public

    Visage contains an image dataset of images with human annotations on whether or not certain attributes are present or depicted in the image. The attribute may either be stereotypical or non-stereotypical w.r.t. to the identity group in the image. It also contains a list of attributes in English along with annotations about whether they are visual.

    google-research-datasets/visage’s past year of commit activity
    6 Apache-2.0 1 0 0 Updated Jul 16, 2024
  • dices-dataset Public

    This repository contains two datasets with multi-turn adversarial conversations generated by human agents interacting with a dialog model and rated for safety by two corresponding diverse rater pools.

    google-research-datasets/dices-dataset’s past year of commit activity
    23 2 1 0 Updated Jul 16, 2024
  • wit Public

    WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

    google-research-datasets/wit’s past year of commit activity
    982 40 0 0 Updated Jul 12, 2024
  • rico_semantics Public

    Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations between selected general UI elements and their text labels. Annotations also include human annotated bounding boxes which are more accurate and have a greater coverage of UI elements.

    google-research-datasets/rico_semantics’s past year of commit activity
    19 CC-BY-SA-4.0 2 1 0 Updated Jun 27, 2024
  • google-research-datasets/tpu_graphs’s past year of commit activity
    C++ 119 Apache-2.0 43 2 1 Updated Jun 25, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Most used topics

Loading…