Add support for NERMuD 2023 Dataset #3087

stefan-it · 2023-02-07T23:01:31Z

Hi,

this PR adds support for the NERMuD 2023 Dataset. This dataset is a task presented at EVALITA 2023 consisting in the extraction and classification of named-entities in a document, such as persons, organizations, and locations.

From the Shared Task page:

NERMuD 2023 will include two different sub-tasks:

Domain-agnostic classification (DAC). Participants will be asked to select and classify entities among three categories (person, organization, location) in different types of texts (news, fiction, political speeches) using one single general model.
Domain-specific classification (DSC). Participants will be asked to deploy a different model for each of the above types, trying to increase the accuracy for each considered type.

For this purpose, the added NER_NERMUD dataset is implemented as a Multi Corpus. That means, different corpora (domains in this case) can be used and combined.

Usage

To use all domains, the following example can be used:

from flair.datasets import NER_NERMUD

all_nermud_domains = NER_NERMUD(domains="all")

It is also possible to combine domains, such as:

from flair.datasets import NER_NERMUD

combined_domains = NER_NERMUD(domains=["WN", "ADG"])

Possible domains are:

WN - Wikinews
FIC - Fiction
ADG - De Gasperi

More information can be found here.

alanakbik · 2023-02-14T10:00:00Z

@stefan-it thanks for adding this!

stefan-it added 4 commits February 7, 2023 23:49

datasets: add support for NERMuD 2023 corpus

9aacaf0

init: add NERMuD corpus to global config

57a64dc

tests: add some testcases for NERMuD corpus

0d7b44b

datasets: add domain check for NER_NERMUD

3d82c55

alanakbik merged commit 4de91b2 into master Feb 14, 2023

alanakbik deleted the add-nermud-support branch February 14, 2023 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for NERMuD 2023 Dataset #3087

Add support for NERMuD 2023 Dataset #3087

stefan-it commented Feb 7, 2023

alanakbik commented Feb 14, 2023

Add support for NERMuD 2023 Dataset #3087

Add support for NERMuD 2023 Dataset #3087

Conversation

stefan-it commented Feb 7, 2023

Usage

alanakbik commented Feb 14, 2023