Skip to content

A pipeline for creating a language model for Serbian in spaCy

License

Notifications You must be signed in to change notification settings

BCDH/spacy-serbian-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Serbian Language Pipeline for Spacy

Work in progress. Far from production ready.

How to use with Spacy?

...

Data files

For testing training, we're using the UD dataset, which has been automatically converted to Cyrillic. This is temporary. We will eventually use our own training data.

Lemmatizer data

  • data originates from Morpho-SLaWS (Tasovac, Rudan and Rudan 2015) and Transpoetika (Tasovac 2012)
  • currently includes both Ekavian and Jekavian forms, I may move Jekavians to the normalization function

About

A pipeline for creating a language model for Serbian in spaCy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published