Skip to content

common-parallel-corpora/common-parallel-corpora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Common Parallel Corpora

A high-quality community-driven extension of multitext-nllb-seed, flores-200, and ntrex-128 to more languages.

release description
common-parallel-corpora-2023-06-19.zip (multitext-nllb-seed, flores-200) + nqo_Nkoo
in progress ntrex-128 + nqo_Nkoo
planning (multitext-nllb-seed, flores-200, ntrex-128) + ful_Adlm

Description of Corpora

dataset description entries languages
cpc/multitext-nllb-seed extended multitext-nllb-seed 6193 41
cpc/multitext-nllb-seed-edits translator edits multitext-nllb-seed 6193x4 1
-- -- -- --
cpc/flores-200-dev extended flores-200-dev 997 205
cpc/flores-200-dev-edits translator edits flores-200-dev 997x4 1
-- -- -- --
cpc/flores-200-devtest extended flores-200-devtest 1012 205
cpc/flores-200-devtest-edits translator edits flores-200-devtest 1012x4 1

2023-06-19: WMT 2023 Nko NMT Task details

2023-06-19: Data Release

Baba Mamadi Diané, Solo Farabado Cissé, and Djibrila Diané (all Nko experts and native speakers) used a novel parallel text curation software to translate nllb-seed, flores-dev and flores-devtest to nqo_Nkoo (ߒߞߏ (Nko) language written in ߒߞߏ (Nko) script).

Each entry was translated once (v1) and verified/edited two or three times (v2, v3, v4).

lines words path
6193 184138 data/common-parallel-corpora/multitext-nllb-seed/nqo_Nkoo
6193 170555 data/common-parallel-corpora/multitext-nllb-seed-edits/nqo_Nkoo.v1
6193 177703 data/common-parallel-corpora/multitext-nllb-seed-edits/nqo_Nkoo.v2
6193 182843 data/common-parallel-corpora/multitext-nllb-seed-edits/nqo_Nkoo.v3
6193 184138 data/common-parallel-corpora/multitext-nllb-seed-edits/nqo_Nkoo.v4
-- -- --
997 27361 data/common-parallel-corpora/flores-200-dev/nqo_Nkoo.dev
997 24455 data/common-parallel-corpora/flores-200-dev-edits/nqo_Nkoo.dev.v1
997 25656 data/common-parallel-corpora/flores-200-dev-edits/nqo_Nkoo.dev.v2
997 26541 data/common-parallel-corpora/flores-200-dev-edits/nqo_Nkoo.dev.v3
997 27361 data/common-parallel-corpora/flores-200-dev-edits/nqo_Nkoo.dev.v4
-- -- --
1012 29503 data/common-parallel-corpora/flores-200-devtest/nqo_Nkoo.devtest
1012 25924 data/common-parallel-corpora/flores-200-devtest-edits/nqo_Nkoo.devtest.v1
1012 27771 data/common-parallel-corpora/flores-200-devtest-edits/nqo_Nkoo.devtest.v2
1012 29521 data/common-parallel-corpora/flores-200-devtest-edits/nqo_Nkoo.devtest.v3
1012 29503 data/common-parallel-corpora/flores-200-devtest-edits/nqo_Nkoo.devtest.v4

Contributors

  • Moussa Koulako Bala Doumbouya (Stanford University, FriaSoft)
  • Baba Mamadi Diané (Nko USA Inc.)
  • Solo Farabado Cissé (Nko USA Inc.)
  • Djibrila Diané (Nko USA Inc.)
  • Abdoulaye Sow (FriaSoft)
  • Séré Moussa Doumbouya (FriaSoft)
  • Daouda Bangoura (FriaSoft)
  • Fodé Moriba Bayo (FriaSoft)
  • Christopher D Manning (Stanford University)

Ackowledgement

The authors would like to acknowledge the following sources of support:

  • Unrestricted Research Gift from Meta Platforms, Inc. to NKO USA Inc.
  • Nko USA Inc.
  • FriaSoft
  • Stanford Graduate Fellowship (SGF, P. Michael Farmwald)
  • Stanford NLP Group
Nko ߒߞߏ USA FriaSoft Meta Platforms, Inc. Stanford University

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.