Skip to content

Latest commit

 

History

History
75 lines (60 loc) · 3.24 KB

Intro NLP Links.md

File metadata and controls

75 lines (60 loc) · 3.24 KB

Some Overview Notes About Libraries and Books

#Python libraries for NLP:

The basics (tokenization, tf-idf):

  • NLTK
  • NLTK is used in conjunction with sklearn for low-level token/parsing operations including POS, tokenization, spelling correction, etc. (The point where it breaks down is tf-idf, which is weirdly laborious. Switch over to pattern or sklearn.)
  • http://www.nltk.org/
  • Cheatsheet: https://blogs.princeton.edu/etc/files/2014/03/Text-Analysis- with-NLTK-Cheatsheet.pdf
  • Pattern.py
  • A lib that evolved rather independently in NL, has nicer api for some operations at low level and mid-level (like doc similarity and clustering), but may not be fast enough (built entirely in python, AFAIK)
  • http://www.clips.ua.ac.be/pattern - for nlp, especially pattern.en and pattern.vector
  • Also TextBlob:

Advanced Python Toolkits (for machine learning, etc):

  • Sklearn (scikit-learn)
  • Standard in use for classification, but can be hard to examine features/etc. Good for volume/speed.
  • http://scikit-learn.org/stable/ (excellent docs and tutorials/examples)
  • Gensim - 'topic modeling for humans'
  • Some cutting edge stuff in here like word2vec (you can see a toy vis of mine using it here). Good examples on dealing with large offline text, and some mechanisms for exchanging data with sklearn.
  • https://radimrehurek.com/gensim/
  • Radim's blog has a bunch of tutorial/examples: http://radimrehurek.com/blog/

Javascript Options

R

There are tons, and digital humanities folks use R more than Python, currently; this is not an exhaustive list!

Java

Lots of libraries; I admit I only play with Stanford's right now

Some Python NLP-related Books:

  • Python Text Processing with NLTK 2.0 Cookbook (Perkins)
  • Python 3 Text Processing With NLTK Cookbook (Perkins) - maybe slightly updated over NLTK 2.0 book? I have both but don't know detailed diffs.
  • Building Machine Learning Systems with Python (Willi Richert & Luis Coelho, lots of text examples)
  • Natural Languages Processing with Python (Bird, Klein etc. - first NLTK book, out of date a bit) http://www.nltk.org/book/