Skip to content

itissandeep98/IR2021_A2_34

Repository files navigation

IR2021_A2_34

Information Retrieval Assignment 2

Preprocessing Steps

  • Text conversion to lowercase.
  • Tokenization using nltk.
  • Removal of stop words using nltk.
  • Special characters excluding alphanumeric are removed.
  • All singly occurring characters are removed.
  • Finally a set of all the words is created.

Assumptions

  • Input Query is case insensitive.

More Information provided in Assignment Report