Web-Crawler 🕷️

This script will help you to crawl through websites recursively and download files as per your requirement(pdf, docx, ppt, etc.)

Jack searching for some files ...

Usage -

 webCrawler.py [-h] [--level LEVEL] [--file_types FILE_TYPES] website_url path

positional arguments:
  website_url               URL from which you want to download files
  path                      Path to which you want to dump downloaded files

optional arguments:
  -h, --help                show this help message and exit
  --level LEVEL             Level of recursively visiting pages from a link,
                            default = 3
  --file_types FILE_TYPES   Specify comma seperated filetypes such as pdf,txt to
                            be downloaded, default is pdf

Example -

 python webCrawler.py --level 4 --file_types pdfg,txt,xls https://www.mywebsite.com E:/dump

In this example, script will download all pdfs,txt files from www.mywebsite.com by recursively visiting 4 pages and will dump downloaded files to E:/dump path.

Development status -

This project is under development and soon will be completed.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitignore		.gitignore
README.md		README.md
webCrawler.py		webCrawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Crawler 🕷️

Usage -

Example -

Development status -

About

Releases

Packages

Languages

cjelsa/Web-Crawler

Folders and files

Latest commit

History

Repository files navigation

Web-Crawler 🕷️

Usage -

Example -

Development status -

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages