Overview

This is a repository to share dependency-based Japanese word embeddings which we trained for experiments in the article 係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings).

We applied the method proposed in the paper Dependency-based Word Embeddings to Japanese.

Training Details

To prepare the training data, we first extracted sentences from Japanese Wikipedia dumps.
Then, we parsed them using an NLP framework GiNZA.
Finally, we trained the embeddings with the script provided in the page of the paper's first author.

The parameter settings for the experiments is as below where DIM is the number of dimensions written in each file name.

-size DIM -negative 15 -threads 20

Download URL

You can download the data from links below.
Download begins soon after you click on a link.

dep-ja-100dim (85.4 MB)
- 100 dimensional word vectors
dep-ja-200dim (169.9 MB)
- 200 dimensional word vectors
dep-ja-300dim (254.5 MB)
- 300 dimensional word vectors

How to Use the Embeddings

You can use the embeddings in the same way as embeddings trained by using the original implementation of Word2Vec.

Here is an example code to load them from your Python script.

from gensim.models import KeyedVectors
vectors = KeyedVectors.load_word2vec_format("path/to/embeddings")

When Using Them for Your Research

When writing your paper using them, please cite this bibtex,

@misc{matsuno2019dependencybasedjapanesewordembeddings,  
    title  = {Dependency-based Japanese Word Embeddings},  
    author = {Tomoki, Matsuno},  
    affiliation = {LAPRAS inc.},
    url    = {https://github.com/lapras-inc/dependency-based-japanese-word-embeddings},  
    year   = {2019}  
}

References

松田寛, 大村舞, 浅原正幸. 短単位品詞の用法曖昧性解決と依存関係ラベリングの同時学習, 言語処理学会第 25 回年次大会発表論文集, 2019.
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, .
Levy, O. & Goldberg, Y. (2014). Dependency-Based Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (p./pp. 302--308), June, Baltimore, Maryland: Association for Computational Linguistics.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Training Details

Download URL

How to Use the Embeddings

When Using Them for Your Research

References

About

Releases

Packages

lapras-inc/dependency-based-japanese-word-embeddings

Folders and files

Latest commit

History

Repository files navigation

Overview

Training Details

Download URL

How to Use the Embeddings

When Using Them for Your Research

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages