GitHub - lwmlyy/Knowledge-based-WSD: Knowledge-based Word Sense Disambiguation

This code is an implementation of the Knowledge-based Word Sense Disambiguation (KWSD) Framework that exploits knowledge from different perspectives. In detail, it makes use of knowledge from Wikipedia and WordNet in a similarity-based WSD method and combines the method with a graph-based WSD method in some personalized settings such as top_3 senses filtration, customized sense graph construction and sense importance inheritance. State-of-the-art performance on several standard WSD datasets has proven the effectiveness of such a knowledge-exploitation framework.

Quick Evaluation

You can make use of the evaluation code provided in [1]. Before that, you should download the evaluation framework on the website EACL17. In this resource, you can use the scoror.java to evaluate the performance of our system on different datasets once at a time with the following code, using the files such as 'senseval2.raw.KWSD.key'.

java Scorer senseval2/senseval2.gold.key.txt senseval2.raw.KWSD.key

Also, you can use the evaluation code from UKB website, UKB[2]. Then use evaluate.sh to conduct evaluation after you move our result document 'raw.KWSD.key' into ukb-3.2/wsdeval/Keys/raw. The result is supposed to be as follows:

./evaluate.sh

Evaluating in Keys
/* raw.KWSD

ALL P= 68.0% R= 68.0% F1= 68.0%
semeval2007 P= 56.9% R= 56.9% F1= 56.9%
semeval2013 P= 68.4% R= 68.4% F1= 68.4%
semeval2015 P= 72.3% R= 72.3% F1= 72.3%
senseval2 P= 69.6% R= 69.6% F1= 69.6%
senseval3 P= 66.1% R= 66.1% F1= 66.1%

Reproduction of the system's result

Quick: Using the embeddings (say "eLSA01" for senseval2) in the folder, you can run disambiguation.py to reproduce the exact reported results. The code itself can evalute the results and also output a file named 'raw.KWSD.key' which can still be evaluated with the above method.
Slow: If you want to reproduce the results starting from the domain knowledge document retrieval, it might take a few hours. You also need to download a few documents including British National Corpus(BNC)[3] and Wikipedia dump for document retriever in [4]. The details will be given in the following section.

Reproduce results from scratch

Prepare the 'document retriever' in [4]. Also, you need to download the TF-IDF model and Wikipedia database on that website for a quick implementation. We use the model for document name retrieval in docname_retrieval.py and use the database to retrieve the corresponding documents in doc_retrieve.py. query_access.py is to access the query for document retrieval.
The retrieved documents are combined with BNC documents which are pre-processed with bnc_process.py.The combined document set is then used to learn word representations via lsa in gensim_lsa.py.
Run disambiguation.py for disambiguation of each dataset with the following settings.

python disambiguation.py -l True -d domain_doc_name

[1] Raganato A.; Camacho-Collados J.; and Navigli R. 2017. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 99-110, Valencia, Spain: Association for Computational Linguistics.
[2] Agirre E.; de Lacalle O. L.; and Soroa A. 2018. The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD. In Proceedings of Workshop for NLP Open Source Software, 29–33, Melbourne, Australia: Association for Computational Linguistics.
[3] The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/
[4] Chen D.; Fisch A.; Weston J.; and Bordes A. 2017. Reading Wikipedia to Answer Open-Domain Questions, In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 1870-1879, Vancouver, Canada: Association for Computational Linguistics.# Knowledge-based-WSD

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BNC		BNC
WSD_Unified_Evaluation_Datasets		WSD_Unified_Evaluation_Datasets
XWN-2.1		XWN-2.1
results		results
README.md		README.md
allcontext01.txt		allcontext01.txt
allcontext04.txt		allcontext04.txt
allcontext07.txt		allcontext07.txt
allcontext13.txt		allcontext13.txt
allcontext15.txt		allcontext15.txt
disambiguation.py		disambiguation.py
doc_retrieve.py		doc_retrieve.py
docname_retrieval.py		docname_retrieval.py
eLSA01		eLSA01
eLSA04		eLSA04
eLSA07		eLSA07
eLSA13		eLSA13
eLSA15		eLSA15
eLSA_vocab01		eLSA_vocab01
eLSA_vocab04		eLSA_vocab04
eLSA_vocab07		eLSA_vocab07
eLSA_vocab13		eLSA_vocab13
eLSA_vocab15		eLSA_vocab15
gloss01.txt		gloss01.txt
gloss04.txt		gloss04.txt
gloss07.txt		gloss07.txt
gloss13.txt		gloss13.txt
gloss15.txt		gloss15.txt
main_algorithm.py		main_algorithm.py
query_access.py		query_access.py
requirement.txt		requirement.txt
semeval01.txt		semeval01.txt
semeval04.txt		semeval04.txt
semeval07.txt		semeval07.txt
semeval13.txt		semeval13.txt
semeval15.txt		semeval15.txt
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Evaluation

Reproduction of the system's result

Reproduce results from scratch

About

Releases

Packages

Languages

lwmlyy/Knowledge-based-WSD

Folders and files

Latest commit

History

Repository files navigation

Quick Evaluation

Reproduction of the system's result

Reproduce results from scratch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages