Skip to content

Undergraduate graduation project ---- Chinese Word Segmentation for Weibo text

License

Notifications You must be signed in to change notification settings

guopeiming/NNSegmentor

Repository files navigation

NNTranSegmentor

Undergraduate graduation project ---- Chinese Word Segmentation for Weibo text
2020本科毕业设计 ---- 面向微博文本的中文分词

Installation

This software has been developed on Windows using python 3.6 and pytorch1.3 Since it uses some recent features of pytorch it can be incompatible with older versions.

The following methods are provided to install dependencies:
If you use pip, we strongly recommend you to create an virtual python3.6 environment by conda or virtualenv, where NNTranSegmentor is installed after.

  • conda
    conda env create -f environment.yml
  • pip
    pip install -r requirements.txt

Network Structure

Performance

method F time
character-only 95.0 0.75h
add-word_compose 95.3 9.3h
add-batch_training 95.3 2.1h

Usage

  • Preprocess
    Build vocab and insts from corpus, and save them to fileo. Details see preprocess.py.

    python ./preprocess.py --train ./data/pku/train.pku.hwc.seg --dev ./data/pku/dev.pku.hwc.seg --test ./data/pku/test.pku.hwc.seg
    • o (str) ------ path of vocab and insts built by preprocess.py.
    • train (str) ------ path of train text.
    • dev (str) ------ path of dev text.
    • test (str) ------ path of test text.
  • Train

    python train.py

Author

student: Peiming Guo
supervisor: Meishan Zhang

About

Undergraduate graduation project ---- Chinese Word Segmentation for Weibo text

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published