Skip to content

Latest commit

 

History

History
120 lines (112 loc) · 4.34 KB

BASELINES.md

File metadata and controls

120 lines (112 loc) · 4.34 KB

HL Dataset Baselines

Introduction

We provide a collection of baselines for the high-level captioning task, consisting in generating scene, action and rationale description for the image and baselines of the narrative captiong generation task using the HL-Narrative Dataset an extension of the HL Dataset generating narrative captions based on the three axes. All the models are released on Huggingface Hub 🤗.

HL-scenes

Model Cider SacreBLEU ROUGE-L
GIT-base 103.00 24.67 33.90
BLIP-base 116.00 26.46 35.30
ClipCap (LM+Mapping) 145.00 36.73 42.83

HL-actions

Model Cider SacreBLEU ROUGE-L
GIT-base 110.63 15.21 30.43
BLIP-base 123.07 17.16 32.16
ClipCap (LM+Mapping) 176.54 27.37 39.15

HL-rationales

Model Cider SacreBLEU ROUGE-L
GIT-base 42.58 5.90 18.57
BLIP-base 46.11 6.21 19.74
ClipCap (LM+Mapping) 78.04 11.71 25.76

HL-Narratives

Model Cider SacreBLEU ROUGE-L
GIT-base 75.78 11.11 27.61
BLIP-base 79.39 11.70 26.17
ClipCap (LM+Mapping) 63.91 8.15 24.53