HL Dataset Baselines

Introduction

We provide a collection of baselines for the high-level captioning task, consisting in generating scene, action and rationale description for the image and baselines of the narrative captiong generation task using the HL-Narrative Dataset an extension of the HL Dataset generating narrative captions based on the three axes. All the models are released on Huggingface Hub 🤗.

HL-scenes

Model	Cider	SacreBLEU	ROUGE-L
GIT-base	103.00	24.67	33.90
BLIP-base	116.00	26.46	35.30
ClipCap (LM+Mapping)	145.00	36.73	42.83

HL-actions

Model	Cider	SacreBLEU	ROUGE-L
GIT-base	110.63	15.21	30.43
BLIP-base	123.07	17.16	32.16
ClipCap (LM+Mapping)	176.54	27.37	39.15

HL-rationales

Model	Cider	SacreBLEU	ROUGE-L
GIT-base	42.58	5.90	18.57
BLIP-base	46.11	6.21	19.74
ClipCap (LM+Mapping)	78.04	11.71	25.76

HL-Narratives

Model	Cider	SacreBLEU	ROUGE-L
GIT-base	75.78	11.11	27.61
BLIP-base	79.39	11.70	26.17
ClipCap (LM+Mapping)	63.91	8.15	24.53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BASELINES.md

BASELINES.md

HL Dataset Baselines

Introduction

HL-scenes

HL-actions

HL-rationales

HL-Narratives

Files

BASELINES.md

Latest commit

History

BASELINES.md

File metadata and controls

HL Dataset Baselines

Introduction

HL-scenes

HL-actions

HL-rationales

HL-Narratives