Listener Model for PhotoBook Game

This repo houses the official PyTorch implementation for the following paper

Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain
Shih-Lun Wu, Yi-Hui Chou, and Liangze Li
Annual Meeting of the Association of Computational Linguistics (ACL) 2023
[ArXiv]

Installation

conda create -n photobook python=3.8.10
conda activate photobook
pip install -r requirements.txt

python
>>> import nltk
>>> nltk.download('punkt')

Data Preprocessing

Read ../data/data_splits.json and save processed log data to ../data/{split}_sections.pickle
```
cd preprocess
python dialogue_segmentation.py
```
Generate CLIP score

Read ../data/{split}_sections.pickle and save data to ../data/{split}_clean_sections.pickle
```
python process_section.py
```

Extract image features with Segformer

Save features at ../data/image_feats.pickle, the saved data is a dictionary (key: image path, value: hidden features)
```
python process_image.py
```

Training and Inference

Edit hyperparams in model/variables.py

Training (with the best-performing configuration)

python3 train.py config_paper/EXPERIMENT_JSON exp/EXPERIMENT_NAME

To reproduce logged ablations, use different EXPERIMENT_JSON for each run.
- Ours - vlscore_all.json
- - VisAttn - vlscore_visattn.json
- - CLIPScore - base_deberta.json
- - CLIPScore + VisAttn - visattn.json
- - Dense learning signals - vlscore_all.json (change DLS to False in model/variables.py)
Tweak random seeds for optimal performance.

Inference

python3 inference.py exp/EXPERIMENT_NAME

Baseline Model Adapted from Takmaz et al., 2020

Model implementation is based on official PhotoBook repo
To run the Takmaz baseline
```
cd takmaz_baseline/
python3 train.py
```
To use reference chains extracted using CLIPScore
- set REF_CHAIN_PATH = ref_chain_img_clipscore.pickle in takmaz_baseline/variables.py
Note that different random seeds might be needed in takmaz_baseline/variables.py for optimal results in different experiments.

Utterance-based Reference Chain Extraction

This part is largely inherited from official PhotoBook repo, except that we add the option to use CLIPScore as the scoring metric.

To reproduce the whole extraction and evaluation procedure described in (Takmaz et al., 2020), run these commands in chain-extraction.

python src/extract_segments.py out/all_segments.dict --stopwords --meteor --from_first_common --utterances_as_captions
python src/make_chains.py out/all_segments.dict out/all_chains.json --score f1
python src/make_gold_chains.py out/gold_chains.json --from_first_common --first_reference_only
python src/make_dataset.py out/all_chains.json out/gold_chains.json out/dataset

python src/extract_segments.py out/eval_segments.dict --path_game_logs data/logs/test_logs.dict --stopwords --meteor --from_first_common --utterances_as_captions
python src/eval_chains.py out/eval_segments.dict

To alternatively use CLIPScore as part of the scoring in extraction, just add the --clipscore option when running extract_segments.py above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Listener Model for PhotoBook Game

Installation

Data Preprocessing

Training and Inference

Baseline Model Adapted from Takmaz et al., 2020

Utterance-based Reference Chain Extraction

Files

README.md

Latest commit

History

README.md

File metadata and controls

Listener Model for PhotoBook Game

Installation

Data Preprocessing

Training and Inference

Baseline Model Adapted from Takmaz et al., 2020

Utterance-based Reference Chain Extraction