Skip to content

stressosaurus/raw-data-google-ngram

Repository files navigation

The Google Ngram Data Downloader, Filterer, and Normalizer

Alex John Quijano

Description. The google ngram viewer ( https://books.google.com/ngrams ) is a freely accessible database that represents years of human discourse in multiple languages by counting ngrams (word sequences) in the Google books corpus.

Purpose. The scripts on this repository provides an easy way to download, filter, and normalize these large datasets.

Getting Started and Dependencies.

git clone https://github.com/stressosaurus/raw-data-google-ngram.git
pip3 install --user -r requirements.txt

Jupyter Notebook.

jupyter notebook INSTRUCTIONS.ipynb