Skip to content

Small projects in machine learning, data analysis, data engineering.

Notifications You must be signed in to change notification settings

senzelden/projects

Repository files navigation

Small Data Science Projects

Repository for small projects -- includes data wrangling in pandas, data visualization, classification models, regression models, and time series analysis. Mostly from the first weeks of the SPICED Data Science Program.

1. Gapminder & Matplotlib

Project to test out matplotlib's capabilities. Even while only using matplotlib you can create beautiful animated charts. Based on the gapminder project. Go here

2. Titanic & Random Forest

A short look at a classic classification problem. Excessive feature engineering combined with a random forest model that is fine tuned with the help of GridSearchCV to push the kaggle score above 80%. Go here

3. Bikeshare & Regression Models

Trying out different regression models on the bikeshare dataset on kaggle. Random Forest Regressor in combination with Gradient Boosting Regressor gives the lowest Root Mean Squared Log Error. Go here

4. Lyrics Classifier & Lyrik Classifier

Double project about english lyrics (from Indie Rock Bands) and german poetry (from lyrikline.org). Web Scraping is done with BeautifulSoup. Spacy is used as a Tokenizer and Lemmatizer. Texts are sent through a TFIDF-Vectorizer. A Naive Bayes model is used to predict artists/poets. Go lyrics or lyrik

5. Berlin Temperature Forecast

Time series project to predict future temperature for Berlin Tempelhof using AR and ARIMA models with a Walk Forward approach. Go here