Scrape & Sense

A comprehensive script for web scraping and NLP analysis, providing detailed insights from extracted articles.

Description

Scrape & Sense is a web scraping and NLP project that analyzes sentiment and readability metrics of articles extracted from websites. Using Natural Language Processing techniques, it computes Positive Score, Negative Score, Polarity Score, Subjectivity Score, and readability metrics like Average Sentence Length, Percentage of Complex Words, and Fog Index.

The workflow includes scripts for data extraction, stop word removal, and analysis, ensuring reproducibility with detailed setup instructions. Results are in CSV format for easy interpretation and further analysis.

Getting Started

Installation

Put the following command in your terminal/cmd after traversing to the designated folder

git clone https://github.com/Onaga08/scrape-and-sense.git

This repository uses several Python libraries and dependencies. Install all requirements through the command below.

pip install -r requirements.txt

Usage

This project has two broad functions:

Web Scraping Using BeautifulSoup
NLP Analysis

The runnables along with the required input and expected output of each python file is explained in instructions.md

Pre-Requisite Directories/Files

Input.xlsx - Contains link for articles hosted on BlackCoffer website.
Dict/ - Contains txt files for positive-words and negative-words analysis
Stop Words/ - Contains txt files of three different types of Stop_Words

Project Workflow

graph TD
    A[Input.csv] -->|main.py| B[text_files directory]
    B -->|check.py| C{Articles with < 3 lines?}
    C -->|Yes| D[Error in data extraction]
    D -->A
    C -->|No| E[text_files directory]
    E -->|rmv_StopWords.py| F[updated_text_files directory]
    F -->|Analysis.py + output.py| G[output.csv]

Detailed Analysis

For detailed information on the formulas and logic used in the NLP analysis, please refer to the ANALYSIS_DETAILS.md file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrape & Sense

Description

Getting Started

Installation

Usage

Pre-Requisite Directories/Files

Project Workflow

Detailed Analysis

License is attached

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dict		Dict
Stop Words		Stop Words
text_files		text_files
updated_text_files		updated_text_files
.gitignore		.gitignore
Analysis.csv		Analysis.csv
Analysis.py		Analysis.py
Analysis_Details.md		Analysis_Details.md
Input.csv		Input.csv
Input.xlsx		Input.xlsx
LICENSE		LICENSE
README.MD		README.MD
check.py		check.py
function.py		function.py
instructions.md		instructions.md
main.py		main.py
output.py		output.py
requirements.txt		requirements.txt
rmv_StopWords.py		rmv_StopWords.py

License

Onaga08/scrape-and-sense

Folders and files

Latest commit

History

Repository files navigation

Scrape & Sense

Description

Getting Started

Installation

Usage

Pre-Requisite Directories/Files

Project Workflow

Detailed Analysis

License is attached

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages