Bitextor generates translation memories from multilingual websites
-
Updated
Jun 18, 2024 - Python
Bitextor generates translation memories from multilingual websites
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
A parser for annotated MuseScore 3 files.
A full-text article retrieval pipeline for biomedical literature.
Augmentation scripts for the bAbI Dialog Tasks dataset
A clean Fusha Arabic tagged corpus.
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Generate pseudo-English sentences for research in semantic composition
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
Create a corpus for fine-tuning an OpenAI model
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."