Skip to content

This was a HTML web scraping project with Python's libraries. The objective of the project was to extract user's comments in "mac power user" forum, cleanse data, tokenize text/comments, classify and store the words in datafrom.

Notifications You must be signed in to change notification settings

SunlongNgouv/Web-Scraping-with-mac-power-user-forum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

NLP Web Scraping Techniques for mac power user forum

Introduction

mac power user is an online forum hosted by Stephen Hackett and David Sparks to offer talks about new released features and feedbacks of mac products. The forum also serves as a chatroom where mac users can share their experiences or find helps from community who had the same issues.

The forum was classified into 10 categores:

  1. Announcements and Help
  2. Episodes
  3. Hardware
  4. Software
  5. Homescreen & Office setups
  6. Cool Workflows
  7. Tech Support
  8. Beta Town
  9. Uncategorized
  10. Focused

Objective

This project was conducted to manage unstructured data, texts from user's conversations, in mac power user forum through deployed web scraping techniques in Python code. The project was narrowed down to Episodes category where there were 600+ topics. All extracted texts (or conversations) would wrangle, engineer and store the output in excel CSV format for reporting.

Python libraries being used:

  1. requests
  2. BeautifulSoups
  3. re
  4. Pandas

About

This was a HTML web scraping project with Python's libraries. The objective of the project was to extract user's comments in "mac power user" forum, cleanse data, tokenize text/comments, classify and store the words in datafrom.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages