Skip to content

Journalist Repo find authors, articles, and author profile links for various news sites

Notifications You must be signed in to change notification settings

UsmanCanCode/Journalist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIND A JOURNALIST

A simple web app that is build using Scrapy and Flask to collect authors and their articles from famous new sites.

The scrapy library is used to build spiders that crawl news site and collect data and store them in MySQL database. The scrapy setup is stored in SiteSpiders folder and each spider has its own setup for example cnn_spider.py crawl the cnn website. MySQL database store the items in two tables authors and articles using scrapy item pipelines the database is populated. There is one-to-many relation between the authors.id and articles.author_id.

Here is the schema for the tables:

database_schema

Using flask, a simple web app was build stored in the flaskweb folder. The flaskweb has two routes \authors, which list all the authors from particular new site and <id><author_name>, which list all the articles for particular author. The routes are build using flask blueprint model. The authors.py and articles.py has the setup for \authors and <id><author_name> views respectively. A base html template is used and store in templates folder. The data is retrieved from MySQL database and the logic is setup in db.py file.

Here are the two routes:

authors articles

About

Journalist Repo find authors, articles, and author profile links for various news sites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published