Python implementation of a web crawler that, from a set of seed urls, retrieves the most similar pages.
python
crawler
spider
web-crawler
nltk
scrapy
stemming
text-preprocessing
index-construction
page-rank
similarity-criteria
-
Updated
Aug 12, 2021 - Python