Skip to content

Ranking significant features for increasing social media buzz via regression analysis, using dataset provided by University of California Irvine.

License

Notifications You must be signed in to change notification settings

thiagorcdl/social_media_buzz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

preview

Social Media Buzz

Released under the MIT license. This Repository uses a generated Social Preview from @pqt/social-preview

Ranking Significant Features for Increasing Engagement on Social Media via Regression Analysis.

This code was developed as a study tool for the Predictive Modeling, Model Fitting, and Regression Analysis course provided by the University of California Irvine on Coursera. It utilizes the Buzz in Social Media data set, available at the UCI Machine Learning Repository, for identifying the attributes in social media content that have the highest correlation to the amount of repercussion it gained. To achieve such result, several linear regression models are constructed, then ranked based on their respective model fit measure (R-square).

Usage

  1. Clone repository
  2. Fetch dataset (regression.tar.gz)
  3. Extract inside {PROJECT_ROOT}/assets/dataset so you have the following directories:
    • {PROJECT_ROOT}/assets/dataset/regression/Twitter
    • {PROJECT_ROOT}/assets/dataset/regression/TomsHardware (won't be used)
  4. Install requirements:
    • pip install -r requirements.txt
  5. Run social_media_buzz module:
    • python -m social_media_buzz
  6. Check results under /assets/results/

Acknowledgements

Special thanks to François Kawala, Ahlame Douzal, Eric Gaussier, and Eustache Diemert (from Université Joseph Fourier and BestofMedia Group) for providing the data set used here.

I'd also like to thank University of California Irvine for hosting the UCI Machine Learning Repository, where the data set can be downloaded.

Todo

Essential

  • Load data from file
  • Divide data in Training (80%) vs Testing (20%)
  • Create linear regression model for a pair of variables (1 predictor)
  • Cycle through features
  • Get R-squared for each attribute
  • Rank attribute based on R-squared value.
  • Write short report

Extra

  • Create several folds for Training/Testing data (Cross-validation)
  • Cycle through folds
  • Rank attribute based on testing data accuracy.
  • Generate charts

Above and Beyond

  • Fetch data set automatically
  • Compare both rankings automatically
  • Optimize with threads
  • Optimize with Cython?

About

Ranking significant features for increasing social media buzz via regression analysis, using dataset provided by University of California Irvine.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published