Skip to content

hemant-code625/Data-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Scraping using Scrapy and PyMongo

Welcome to the Data Scraping and Storage GitHub repository! This project focuses on utilizing the Scrapy web scraping framework to extract data of Travel and Mystery books from https://books.toscrape.com/index.html and storing it in a MongoDB Atlas database using PyMongo.

Overview

Screenshot 2023-06-24 134729

Screenshot 2023-06-24 134828

This repository contains a Python script that demonstrates how to scrape book data from various websites using Scrapy. The scraped data is then processed and stored in a MongoDB Atlas database for further analysis and use. The script provides a flexible and efficient way to gather book-related information from online sources.

Prerequisites:

To run the script and replicate the project, you'll need the following:

  • Python 3.x
  • Scrapy
  • PyMongo
  • MongoDB Atlas

Make sure to install the necessary dependencies using pip or any other package manager.

Usage

  1. Clone this repository to your local machine:

    git clone https://github.com/your-username/book-data-scraping.git
  2. Navigate to the project directory:

    cd book-data-scraping
  3. Configure MongoDB Atlas:
    • Create a MongoDB Atlas account and set up a new cluster.
    • Obtain the connection string for your MongoDB Atlas cluster.
    • Update the MONGO_URI variable in the script with your connection string.
  4. Customize the scraping process:
    • Open the book_scraper/spiders/books_spider.py file.
    • Modify the spider code to specify the websites to scrape, the data fields to extract, and the desired scraping logic.
    • Feel free to add more spiders or customize existing ones based on your requirements.
  5. Run the scripts:
  6. scrapy crawl books

The script will start scraping the specified websites and store the scraped book data into your MongoDB Atlas database.

Contributing

Contributions to this project are welcome! If you have any ideas for improvements or new features, feel free to open an issue or submit a pull request. Let's make this project even better together.

License

This project is licensed under the MIT License. You are free to use, modify, and distribute the code as per the terms and conditions of the license.

Acknowledgments

Special thanks to the developers of Scrapy and PyMongo for providing powerful tools that make web scraping and database integration seamless. Their contributions are invaluable to this project.

If you have any questions or feedback, please don't hesitate to reach out. Happy scraping and data storage!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published