Skip to content

Neural Grammatical Error Correction for Romanian using Transformer

License

Notifications You must be signed in to change notification settings

teodor-cotet/RoGEC

Repository files navigation

Grammatical Error Correction for Romanin

This repository contains the code and data for: romanian grammatical error correction (GEC) on RONACC.

Download Data

Download the RONACC corpus: RONACC

Tokenized RONACC corpus: RONACC extra

Download pre-trained models

Download the language model: 30mil_wiki_lm
Download the synthetic corpus 10m_synthetic
Download trained Transformer-based fine-tune model: transformer-base-fine-tune

Run Experiment

Install python dependencies:
pip3 install -r requirements.txt
If you want to use LM predictions install kenlm libraries: kenlm
To run decoding on an existing model run:
python3 transformer.py --checkpoint=path_to_model_checkpoint --lm_path=path_to_lm --d_model=size_of_model --decode_mode=True
(the size of the fine tuned model is 768)
To train models run:
python3 transformer.py --checkpoint=path_to_model_checkpoint --separate=False --d_model=size_of_model --use_txt=True --dataset_file=path_to_txt_file_wrong_gold --train_mode=True

If you want to run on tpu, you can use the --use_tpu=True argument, but you need to generated tf records file.

ERRANT

Install ERRANT

You can use errant normall, just pass the argument -lang ro if you want to use it for Romanian. More details in the ERRANT readme.

Citing

@inproceedings{cotet2020neural,
  title={Neural grammatical error correction for romanian},
  author={Cotet, Teodor-Mihai and Ruseti, Stefan and Dascalu, Mihai},
  booktitle={2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)},
  pages={625--631},
  year={2020},
  organization={IEEE}
}

About

Neural Grammatical Error Correction for Romanian using Transformer

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published