Skip to content

Latest commit

 

History

History
220 lines (199 loc) · 6.45 KB

machine_translation.md

File metadata and controls

220 lines (199 loc) · 6.45 KB

Machine Translation

Ngo et al. KSE'18 dataset

Japanese-Vietnamese parallel data is collected from TED talks extracted from WIT3’s corpus. After removing blank and duplicate lines, there are 106758 pairs of sentences. The validation set used in all experiments is dev2010 and the test set is tst2010

Vietnamese - Japanese

Model BLEU Method Reference Code
NMT + JPBPE + VNBPE 11.13 Ngo et al. KSE'18
NMT Baseline 9.39 Ngo et al. KSE'18
SMT Baseline 8.73 Ngo et al. KSE'18

Japanese - Vietnamese

Model BLEU Method Reference Code
NMT + JPBPE + VNBPE + Back Translation + Mix-Source 9.64 Ngo et al. KSE'18
NMT Baseline 8.18 Ngo et al. KSE'18
SMT Baseline 7.73 Ngo et al. KSE'18

IWSLT 2015 Evaluation Campaign

IWSLT 2015: The IWSLT 2015 Evaluation Campaign featured three tracks: automatic speech recognition (ASR), spoken language translation (SLT), and machine translation (MT). For ASR we offered two tasks, on English and German, while for SLT and MT a number of tasks were proposed, involving English, German, French, Chinese, Czech, Thai, and Vietnamese.

TED Data En-Vi: 131k sentences (train), 1080 sentences (tst2015)

Leaderboard

TED: MT English-Vietnamese

Method External Training Data BLEU NIST TER Paper/Source Code
Tall Transformer with Style-Augmented Training 43.37 Chinh et al. '21 vietai/SAT
PJAIT 28.39 6.6650 56.01 Wolk et al. IWSLT'15
JAIST 28.17 6.7092 55.84 Trieu et al. IWSLT'15
KIT 26.60 6.4014 58.26 Ha et al. IWSLT'15
SU 26.41 6.5986 55.60 Luong et al. IWSLT'15
UNETI 22.93 6.0218 60.33 Tran et al. IWSLT'15
BASELINE 27.01 6.4716 58.42 Cettolo et al. IWSLT'15

More Information

TED: MT Vietnamese-English

Method BLEU NIST TER Year
PJAIT 23.46 5.7314 62.20 2015
UMD 21.57 5.7831 59.19 2015
JAIST 21.53 5.6413 62.35 2015
UNETI 20.18 5.1443 66.33 2015
TUT 19.78 5.4559 62.69 2015
BASELINE 24.61 5.9259 59.32 2015

References

  • Task Description The IWSLT 2015 Evaluation Campaign (2015), M. Cettolo et al. [pdf]
  • UNETI '15 The English-Vietnamese Machine Translation System for IWSLT 2015 (2015), H. Tran et al. [link]
  • PJAIT '15 PJAIT Systems for the IWSLT 2015 Evaluation Campaign Enhanced by Comparable Corpora (2015), K. Wolk et al. [pdf]
  • TUD '15 Improvement of Word Alignment Models for Vietnamese-to-English Translation (2015), A. Axelrod et al. [pdf]
  • UMD '15 The UMD Machine Translation Systems at IWSLT 2015 (2015), T. Nomura et al. [pdf]
  • KIT '15 The KIT Translation Systems for IWSLT 2015 (2015), T. Ha et al. [pdf]
  • JAIST '15 UET '15 The JAIST-UET-MITI Machine Translation Systems for IWSLT 2015 (2015), H. Trieu et al. [pdf]
  • SU '15 Stanford Neural Machine Translation Systems for Spoken Language Domains (2015), M. Luong et al. [pdf]

Miscellaneous

📁 Open sources