Data Scraping using Scrapy and PyMongo

Welcome to the Data Scraping and Storage GitHub repository! This project focuses on utilizing the Scrapy web scraping framework to extract data of Travel and Mystery books from https://books.toscrape.com/index.html and storing it in a MongoDB Atlas database using PyMongo.

Overview

This repository contains a Python script that demonstrates how to scrape book data from various websites using Scrapy. The scraped data is then processed and stored in a MongoDB Atlas database for further analysis and use. The script provides a flexible and efficient way to gather book-related information from online sources.

Prerequisites:

To run the script and replicate the project, you'll need the following:

Python 3.x
Scrapy
PyMongo
MongoDB Atlas

Make sure to install the necessary dependencies using pip or any other package manager.

Usage

Clone this repository to your local machine:
git clone https://github.com/your-username/book-data-scraping.git
Navigate to the project directory:
cd book-data-scraping
Configure MongoDB Atlas:
- Create a MongoDB Atlas account and set up a new cluster.
- Obtain the connection string for your MongoDB Atlas cluster.
- Update the MONGO_URI variable in the script with your connection string.
Customize the scraping process:
- Open the book_scraper/spiders/books_spider.py file.
- Modify the spider code to specify the websites to scrape, the data fields to extract, and the desired scraping logic.
- Feel free to add more spiders or customize existing ones based on your requirements.
Run the scripts:

The script will start scraping the specified websites and store the scraped book data into your MongoDB Atlas database.

Contributing

Contributions to this project are welcome! If you have any ideas for improvements or new features, feel free to open an issue or submit a pull request. Let's make this project even better together.

License

This project is licensed under the MIT License. You are free to use, modify, and distribute the code as per the terms and conditions of the license.

Acknowledgments

Special thanks to the developers of Scrapy and PyMongo for providing powerful tools that make web scraping and database integration seamless. Their contributions are invaluable to this project.

If you have any questions or feedback, please don't hesitate to reach out. Happy scraping and data storage!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
booksData		booksData
README.md		README.md
books-mystery_3.html		books-mystery_3.html
books-travel_2.html		books-travel_2.html
mongoScripts.py		mongoScripts.py
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Scraping using Scrapy and PyMongo

Welcome to the Data Scraping and Storage GitHub repository! This project focuses on utilizing the Scrapy web scraping framework to extract data of Travel and Mystery books from https://books.toscrape.com/index.html and storing it in a MongoDB Atlas database using PyMongo.

Overview

Prerequisites:

Usage

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

hemant-code625/Data-Scraping

Folders and files

Latest commit

History

Repository files navigation

Data Scraping using Scrapy and PyMongo

Welcome to the Data Scraping and Storage GitHub repository! This project focuses on utilizing the Scrapy web scraping framework to extract data of Travel and Mystery books from https://books.toscrape.com/index.html and storing it in a MongoDB Atlas database using PyMongo.

Overview

Prerequisites:

Usage

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages