How Far are We from Robust Long Abstractive Summarization? (EMNLP 2022)

[Paper]

Huan Yee Koh^, Jiaxin Ju^, He Zhang, Ming Liu, Shirui Pan

(:star2: denotes equal contribution)

Human Annotation of Model-Generated Summaries

For now, we release the human annotation dataset robust_long_abstractive_human_annotation_dataset.jsonl (or .csv). We use this dataset for the metric comparison in section 5 of our work.

Data Field	Definition
dataset	Whether the model-generated summary is from arXiv or GovReport dataset.
dataset_id	ID_ + document ID of the dataset. To match the IDs with original datasets, please remove the "ID_" string. The IDs are from the original dataset of arXiv and GovReport.
model_type	Model variant which generates the summary. 1K, 4K and 8K represents 1,024, 4096 and 8192 input token limit of the model. For more information, please refer to the original paper.
model_summary	Model-generated summary
relevance	Percentage of the reference summary’s main ideas contained in the generated summary. Higher = Better.
factual consistency	Percentage of factually consistent sentences. Higher = Better.

Human Annotation of Model-Generated Factual Error Types

We are standardizing the data for detailed factual error types. Stay tuned!

Citation

For more information, please refer to our work: How Far are We from Robust Long Abstractive Summarization?

@inproceedings{koh-etal-2022-far,
    title = "How Far are We from Robust Long Abstractive Summarization?",
    author = "Koh, Huan Yee  and Ju, Jiaxin  and Zhang, He  and Liu, Ming  and Pan, Shirui",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.172",
    pages = "2682--2698"
 }

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
annotation_data.jsonl		annotation_data.jsonl
robust_long_abstractive_human_annotation_dataset.csv		robust_long_abstractive_human_annotation_dataset.csv
robust_long_abstractive_human_annotation_dataset.jsonl		robust_long_abstractive_human_annotation_dataset.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Far are We from Robust Long Abstractive Summarization? (EMNLP 2022)

Huan Yee Koh^, Jiaxin Ju^, He Zhang, Ming Liu, Shirui Pan

Human Annotation of Model-Generated Summaries

Human Annotation of Model-Generated Factual Error Types

Citation

About

Releases

Packages

Contributors 2

huankoh/How-Far-are-We-from-Robust-Long-Abstractive-Summarization

Folders and files

Latest commit

History

Repository files navigation

How Far are We from Robust Long Abstractive Summarization? (EMNLP 2022)

Huan Yee Koh*, Jiaxin Ju*, He Zhang, Ming Liu, Shirui Pan

Human Annotation of Model-Generated Summaries

Human Annotation of Model-Generated Factual Error Types

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Huan Yee Koh^, Jiaxin Ju^, He Zhang, Ming Liu, Shirui Pan

Packages