Skip to content

The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems (TACL 2023)

License

Notifications You must be signed in to change notification settings

cambridgeltl/multi3woz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi3WOZ

Code repository for the paper:

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems by Songbo Hu,* Han Zhou,* Mete Hergul, Milan Gritta, Guchun Zhang, Ignacio Iacobacci, Ivan Vulić, ** and Anna Korhonen. **

Multi3WOZ is a novel multilingual, multi-domain, multi-parallel task-oriented dialogue (ToD) dataset. It is large-scale and offers culturally adapted dialogues in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. This dataset is collected via a complex bottom-up data collection process, as shown in the following figure.

Highlights

  • [2024-01-15] We have released an improved end-to-end baseline. Check out our DIALIGHT paper and the codebase.

  • [2023-12-15] The dataset has been updated to correct some errors previously present in the data. We recommend that future projects use this updated version of the dataset.

This Repository

  • data.zip contains the Multi3WOZ dataset in four languages: Arabic (Afro-Asiatic), English (Indo-European), French (Indo-European), and Turkish (Turkic). Each language includes 9,160 multi-parallel dialogues.

  • code directory contains the baseline code to reproduce our experimental results in the paper. We provide our baseline code for all the popular ToD tasks: natural language understanding (NLU), dialogue state tracking (DST), natural language generation (NLG), and end-to-end modelling (E2E).

Baselines

Before running the experiments, please run the following command to uncompress the data

>> unzip data.zip

Then follow each baseline directory's instructions to reproduce our reported results. For example, please follow ./code/nlu/README.md to reproduce our reported NLU results.

Annotation Protocol

Please visit the following website for our annotation instruction: https://cambridgeltl.github.io/multi3woz/.

Issue Report

If you have found any issue in this repository, please contact: sh2091@cam.ac.uk.

About

The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems (TACL 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published