Skip to content

Multi-label text genre classification using Swedish corpora with pre-trained word embeddings.

Notifications You must be signed in to change notification settings

eliyetres/lt2314-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Multi-label text genre classification using Swedish corpora with pre-trained word vectors

Corpus

Using the COCTAILL corpus which contain aprox 800.000 tokens of Swedish texts from coursebooks aimed at second/foreign language (L2) learning. [1]

Word vectors

Model

References

[1] Volodina, Elena & Pilán, Ildikó & Eide, Stian & Heidarsson, Hannes. (2014). You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language.

About

Multi-label text genre classification using Swedish corpora with pre-trained word embeddings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published