Skip to content
/ LEAD Public

A Chinese Spell Checking Model Released on EMNLP2022.

Notifications You must be signed in to change notification settings

geekjuruo/LEAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LEAD

Pytorch Implementation for EMNLP2022 Findings (Long Paper)

"Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking".

Requirements

  • python >= 3.9

  • torch == 1.11.0

  • transformers == 4.14.1

  • hanlp == 2.1.0b27

  • pypinyin == 0.46.0

  • einops == 0.4.1

Prepare Data and Pretrained Model

  1. The raw data contains:

    You can also directly download the processed data from ReaLiSe.

    Put the processed data files in the resources/data directory.

  2. Download pretrained Chinese RoBERTa model, chinese-roberta-wwm-ext, from huggingface, and put the all files in the resources/pretrained_onlybert directory.

  3. Download the glyph-enhanced pretrained model from GCC and put the model files in resources/glyph.

Run the Code

Run run.sh or directly execute the following command:

python main.py --config config/bert_phonics_dictionary_strokes.yaml

The config parameters can be replaced with other files placed in the config directory. And the configuration files can also be modified.

After training, run test.sh to test the model. The checkpoint parameter should be set correctly.

Trained checkpoints for SIGHAN datasets can be found here.

Citation

@inproceedings{li2022learning,
  title={Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking},
  author={Li, Yinghui and Ma, Shirong and Zhou, Qingyu and Li, Zhongli and Yangning, Li and Huang, Shulin and Liu, Ruiyang and Li, Chao and Cao, Yunbo and Zheng, Haitao},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2022},
  year={2022}
}

About

A Chinese Spell Checking Model Released on EMNLP2022.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published