Skip to content
This repository has been archived by the owner on Aug 17, 2020. It is now read-only.

Latest commit

 

History

History
62 lines (42 loc) · 1.26 KB

README.md

File metadata and controls

62 lines (42 loc) · 1.26 KB

VnCoreNLP đã có Python wrapper trên repo chính thức.

VnCoreNLP: https://github.com/vncorenlp/VnCoreNLP

Setup

$ pip install py4j

Copy VnCoreNLP.jar, vncorenlp.py and models to your project in the same directory

Example

See example.py

from vncorenlp import VnCoreNLP

txt = 'học sinh học sinh học'

# Init & load model
vncore_nlp = VnCoreNLP(annotators="wseg pos ner parse")

# Use tokenize only
print(vncore_nlp.tokenize(txt, str=True))
print()
print(vncore_nlp.tokenize(txt, str=False))
print()
print(vncore_nlp.extract(txt))

Output:

học_sinh học_sinh học

['học_sinh', 'học_sinh', 'học']

[
    ['học_sinh', 'N', 'O', '3', 'sub'], 
    ['học_sinh', 'N', 'O', '1', 'nmod'], 
    ['học', 'V', 'O', '0', 'root']
]

Update new VnCoreNLP version

  1. Clone or Download VnCoreNLP
$ git clone https://github.com/vncorenlp/VnCoreNLP
  1. Build VnCoreNLP.jar from VnCoreNLP project
  • Copy Tokenizer.java to VnCoreNLP project
$ cp Tokenizer.java /path/VnCoreNLP/src/main/java/vn/
  • Build jar for Tokenizer.java main class
  1. Copy ./models dir and new .jar file to this repository