GitHub - UsmanCanCode/Journalist: Journalist Repo find authors, articles, and author profile links for various news sites

FIND A JOURNALIST

A simple web app that is build using Scrapy and Flask to collect authors and their articles from famous new sites.

The scrapy library is used to build spiders that crawl news site and collect data and store them in MySQL database. The scrapy setup is stored in SiteSpiders folder and each spider has its own setup for example cnn_spider.py crawl the cnn website. MySQL database store the items in two tables authors and articles using scrapy item pipelines the database is populated. There is one-to-many relation between the authors.id and articles.author_id.

Here is the schema for the tables:

Using flask, a simple web app was build stored in the flaskweb folder. The flaskweb has two routes \authors, which list all the authors from particular new site and <id><author_name>, which list all the articles for particular author. The routes are build using flask blueprint model. The authors.py and articles.py has the setup for \authors and <id><author_name> views respectively. A base html template is used and store in templates folder. The data is retrieved from MySQL database and the logic is setup in db.py file.

Here are the two routes:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
SiteSpiders		SiteSpiders
flaskweb		flaskweb
.gitignore		.gitignore
README.md		README.md
pip		pip
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

UsmanCanCode/Journalist

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages