Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets: add revision support for all Universal Dependencies classes #3420

Merged
merged 3 commits into from
Jun 14, 2024

Conversation

stefan-it
Copy link
Member

Hi,

this PR adds a revision parameter to all Universal Dependencies classes.

Thus, it is possible to specify e.g. a specific commit version for reproducibility.

@stefan-it
Copy link
Member Author

I also tested all UD_* classes with the following script:

import flair

ud_treebanks = [
    flair.datasets.UD_ENGLISH(in_memory = False),
    flair.datasets.UD_GALICIAN(in_memory = False),
    flair.datasets.UD_ANCIENT_GREEK(in_memory = False),
    flair.datasets.UD_KAZAKH(in_memory = False),
    flair.datasets.UD_OLD_CHURCH_SLAVONIC(in_memory = False),
    flair.datasets.UD_ARMENIAN(in_memory = False),
    flair.datasets.UD_ESTONIAN(in_memory = False),
    flair.datasets.UD_GERMAN(in_memory = False),
    flair.datasets.UD_GERMAN_HDT(in_memory = False),
    flair.datasets.UD_DUTCH(in_memory = False),
    flair.datasets.UD_FAROESE(in_memory = False),
    flair.datasets.UD_FRENCH(in_memory = False),
    flair.datasets.UD_ITALIAN(in_memory = False),
    flair.datasets.UD_LATIN(in_memory = False),
    flair.datasets.UD_SPANISH(in_memory = False),
    flair.datasets.UD_PORTUGUESE(in_memory = False),
    flair.datasets.UD_ROMANIAN(in_memory = False),
    flair.datasets.UD_CATALAN(in_memory = False),
    flair.datasets.UD_POLISH(in_memory = False),
    flair.datasets.UD_CZECH(in_memory = False),
    flair.datasets.UD_SLOVAK(in_memory = False),
    flair.datasets.UD_SWEDISH(in_memory = False),
    flair.datasets.UD_DANISH(in_memory = False),
    flair.datasets.UD_NORWEGIAN(in_memory = False),
    flair.datasets.UD_FINNISH(in_memory = False),
    flair.datasets.UD_SLOVENIAN(in_memory = False),
    flair.datasets.UD_CROATIAN(in_memory = False),
    flair.datasets.UD_SERBIAN(in_memory = False),
    flair.datasets.UD_BULGARIAN(in_memory = False),
    flair.datasets.UD_ARABIC(in_memory = False),
    flair.datasets.UD_HEBREW(in_memory = False),
    flair.datasets.UD_TURKISH(in_memory = False),
    flair.datasets.UD_UKRAINIAN(in_memory = False),
    flair.datasets.UD_PERSIAN(in_memory = False),
    flair.datasets.UD_RUSSIAN(in_memory = False),
    flair.datasets.UD_HINDI(in_memory = False),
    flair.datasets.UD_INDONESIAN(in_memory = False),
    flair.datasets.UD_JAPANESE(in_memory = False),
    flair.datasets.UD_CHINESE(in_memory = False),
    flair.datasets.UD_KOREAN(in_memory = False),
    flair.datasets.UD_BASQUE(in_memory = False),
    flair.datasets.UD_CHINESE_KYOTO(in_memory = False),
    flair.datasets.UD_GREEK(in_memory = False),
    flair.datasets.UD_NAIJA(in_memory = False),
    flair.datasets.UD_LIVVI(in_memory = False),
    flair.datasets.UD_BURYAT(in_memory = False),
    flair.datasets.UD_NORTH_SAMI(in_memory = False),
    flair.datasets.UD_MARATHI(in_memory = False),
    flair.datasets.UD_MALTESE(in_memory = False),
    flair.datasets.UD_AFRIKAANS(in_memory = False),
    flair.datasets.UD_GOTHIC(in_memory = False),
    flair.datasets.UD_OLD_FRENCH(in_memory = False),
    flair.datasets.UD_WOLOF(in_memory = False),
    flair.datasets.UD_BELARUSIAN(in_memory = False),
    flair.datasets.UD_COPTIC(in_memory = False),
    flair.datasets.UD_IRISH(in_memory = False),
    flair.datasets.UD_LATVIAN(in_memory = False),
    flair.datasets.UD_LITHUANIAN(in_memory = False),
]

For UD_CZECH and UD_RUSSIAN the training files have changed, I fixed that.

Additionally, UD_BURYAT, UD_CHINESE_KYOTO and UD_NAIJA were not correctly registered, I also fixed that so that they can be used in Flair now.

@stefan-it stefan-it force-pushed the introduce-revision-parameter-uds branch 2 times, most recently from 128316f to d4e5863 Compare April 7, 2024 18:27
@stefan-it stefan-it force-pushed the introduce-revision-parameter-uds branch from d4e5863 to d0b99fe Compare May 3, 2024 22:45
@helpmefindaname helpmefindaname force-pushed the introduce-revision-parameter-uds branch from d0b99fe to 64a05d5 Compare May 29, 2024 19:41
@helpmefindaname helpmefindaname force-pushed the introduce-revision-parameter-uds branch from 64a05d5 to db4a1b1 Compare June 14, 2024 12:27
@helpmefindaname helpmefindaname merged commit 7168fd6 into master Jun 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants