Skip to content

Crawler for Craigslist vehicle ads in all U.S. states

Notifications You must be signed in to change notification settings

faezetta/VMM-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python crawler for Craigslist vehicle ads in all U.S. states using Scrapy

Prerequisities

  • Python 2.x or 3.x
  • pip
  • Scrapy and all its dependencies

Running the Project

To modify the code for your own purpose:

  • items.py includes the items you want to parse in each page. Current version includes the ad URL, posting title, date, location, images and price of the vehicle.
  • runVMMR.py defines the initial URL, how to navigate the pages or follow links and extract and parse the fields defined above for the scraper. In the current setting, only the first 5 pages showing up in search results are visited.
  • settings.py defines the directory where you want to save the images and parsed results. You can modify the download delay based on your project. It makes use of different middleware available online such as this and this.
  • urls.txt includes the list craigslist URLS for different US. Cities that the scraper goes through.

To run the scraper, you just need to run the spider:

$ scrapy crawl craigstlistDemo

The output is a csv file with the posts found including the target fields.

Others

Beware of IP ban from craigslist.

About

Crawler for Craigslist vehicle ads in all U.S. states

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages