Skip to content

Python script using Scrapy to determine if any broken links and images exist on a website

Notifications You must be signed in to change notification settings

megancoyle/image-link-web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image/Link Web Crawler

Overview

The Image/Link Web Crawler is a Python script used to check if there are any broken images/links for a given list of sites. The urls.py file was created since I had a list of several subdomains for a given domain that I needed to crawl with this project.

Setup

  1. Set up your environment with:
python3 -m venv venv
. venv/bin/activate
  1. Install the dependencies:
pip install scrapy
  1. To run the script, make sure you cd into the image-link-web-crawler directory. Then run the following command:
scrapy runspider script.py -o report.csv

The script will pull a CSV report letting you know which pages have 404ed for various external links and images.

About

Python script using Scrapy to determine if any broken links and images exist on a website

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages