Skip to content

Latest commit

 

History

History
18 lines (18 loc) · 1.13 KB

README.md

File metadata and controls

18 lines (18 loc) · 1.13 KB

Teal Deer

TLDR_LDA_and_Text_Summarization.ipynb is the primary current notebook.

Currently just hacking notebook. However, the notebook scrapes text from a directory of academic research pdf's, and then does LDA on it for prioritization of reading. Dataset for this run included just a handful of papers on chatbots from arxiv. OCR portion relies on: https://github.com/euske/pdfminer/blob/master/tools/pdf2txt.py

In process:
Adding a text summarization feature to try to generate abstracts or short summaries for large blocks of text (i.e., an abstract for the rest of a paper). So, not only could papers be prioritized, but could be summarized as well.

Planned updates - See project tab as well:

  • Finish out OCR from PDF files part
  • Complete the text summarization portion - Thanks to Siraj Raval for making the video: https://www.youtube.com/watch?v=ogrJaOIuBx4
  • Clean up into python scripts with test suites
  • Experiment with other front-end usecases: i.e., a slackbot is currently underway (notebook to be added later).
  • Add a CI framework into this repo.
  • Cartoon for a fun logo :-)