Skip to content

Tool for pre-processing corpora before NLP Training Tasks

Notifications You must be signed in to change notification settings

hipster-philology/protogenie

Repository files navigation

Protogenie

Coverage Status Build Status PyPI

How to cite

@software{thibault_clerice_2020_3883586,
  author       = {Thibault Clérice},
  title        = {Protogenie, post-processing for NLP dataset},
  month        = jun,
  year         = 2020,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.3883585},
  url          = {https://doi.org/10.5281/zenodo.3883585}
}

Install from release

pip install protogenie

Install unstable

pip install --upgrade https://github.com/hipster-philology/protogenie/archive/master.zip

Install from source

Start by cloning the repository, and moving inside the created folder

git clone https://github.com/hipster-philology/protogenie.git
cd protogenie/

Create a virtual environment, source it and run

pip install -r requirements.txt

Configuration file

To configurate, you can have a look at the examples in ./tests/test_config but more generally you can and should use the schema: ./ppa_splitter/schema.rng

Workflow

What's the workflow ?

About

Tool for pre-processing corpora before NLP Training Tasks

Resources

Stars

Watchers

Forks

Packages

No packages published