Skip to content

zhang-jinyi/Web-Crawled-Corpus-for-Japanese-Chinese-NMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WCC-JC: A Web Crawled Corpus for Japanese-Chinese NMT (An Over 3 Million Parallel Web-Crawled Corpus for Japanese-Chinese Neural Machine Translation)

We have additionally manually aligned a WCC-JC version 2.0. There are about 2.15 million sentence pairs after merging with the old version (version 1.0).

If you use the corpus, please cite as:

Zhang, J.; Tian, Y.; Mao, J.; Han, M.; Wen, F.; Guo, C.; Gao, Z.; Matsumoto, T. WCC-JC 2.0: A Web-Crawled and Manually Aligned Parallel Corpus for Japanese-Chinese Neural Machine Translation. Electronics 2023, 12, 1140. https://doi.org/10.3390/electronics12051140

Zhang, J.; Tian, Y.; Mao, J.; Han, M.; Matsumoto, T. WCC-JC: A Web-Crawled Corpus for Japanese-Chinese Neural Machine Translation. Appl. Sci. 2022, 12, 6002. https://doi.org/10.3390/app12126002

If you would like to obtain all the data, please contact the following email address while ensuring that it is for your own use only and for research purposes only.

E-mail:wccjc.contact at gmail.com

at ---> @


WCC-JC is distributed under the following license.

Terms of Use for WCC-JC

We will provide WCC-JC data (Hereinafter referred to as "this data.") subject to your acceptance of these Terms of Use. We assume that you have agreed to these Terms of Use when you start using this data (including downloads).

Article 1 (Use conditions) This data can only be used for research purposes involving information analysis (Including, but not limited to, replication and distribution. Hereinafter the same in this article.). The same applies to the derived data created based on this data. However, this data is not available for commercial use, including the sale of translators trained using this data.

Article 2 (Disclaimer) We does not warrant the quality, performance or any other aspects of this data. We shall not be liable for any direct or indirect damages caused by the use of this data. We shall not be liable for any damage to the system caused by the installation of this data.

Article 3 (Other). This data may be changed in whole or in part, or provision of this data may be interrupted or stopped at our discretion without prior notice.

==========

TAKE DOWN

If we include your copyrighted works and you want us to delete it, please contact us with the following information.

  1. Your name, affiliation and E-mail address.
  2. Detailed information of your copyrighted works.
  3. How we can locate your work in our data such as your domain name.

CONTACT

For any inquiries about WCC-JC, please contact us by email.

E-mail:wccjc.contact at gmail.com

at ---> @

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published