Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
-
Updated
Jul 15, 2019 - Python
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)
Data for the DiMSUM shared task at SEMEVAL 2016
Code for NAACL 2019 paper: "Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions"
A set of useful tools for use with multiword expression extraction from parallel corpora for Moses statistical machine translation system
Rigor-Mortis is an online GWAP where players have to find multiword expressions in French sentences
Comparison between various noun compound embeddings
A Python package for Exploratory Data Analysis (EDA) for text-based data.
Foma-based multi-word tagger and morphological analyzer
Data and code for the paper "ID10M: Idiom Identification in 10 Languages" (NAACL 2022).
Repo for the paper "MWE as WSD: Solving Multi-Word Expression Identification with Word Sense Disambiguation"
Java implementation of substitution driven measures of association that can be used to identify MWEs.
Adjacent code related to the paper prepared for Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD 2024), 25th May, 2024.
Python implementation of Substitution-driven Measures of Association
Learning English expressions has never been so easy
Data and code for the paper "NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection".
Add a description, image, and links to the multiword-expressions topic page so that developers can more easily learn about it.
To associate your repository with the multiword-expressions topic, visit your repo's landing page and select "manage topics."