Skip to content

Urban Dict spelling variant dataset. Source code of How to Evaluate Word Representations of Informal Domain?

Notifications You must be signed in to change notification settings

cyk1337/UrbanDict

Repository files navigation

Discovering spelling variants on Urban Dictionary

Source code of the paper How to Evaluate Word Representations of Informal Domain?

Scraping data from Urban Dictionary 🎍

  • Scraping data from webpage:
+ scrapy crawl UD
  • Scrapying data via API:
+ scrapy crawl UD_API

Bootstrapping algorithms

UD_Extractor/

self-training based CRF tagging

SeqLabeling/

Embedding pretraining with Tweets

train Word2Vec, FastText, GloVe with tweets data. `trainEmbedding/'

Twitter hashtag prediction task using pretrained embedding

Employ Twitter hashtag prediction downstream task using above pretrained informal word vectors as the extrinsic evaluation. HashtagPrediction/

Analysis

Use Mean Average Precision (MAP) as the intrinsic evaluation rate on word analogy task. Compare the correlations beween the intrinsic and extrinsic tasks. calcSim

Web interface

informal word pair search tool, written in Flask: demo/