Skip to content

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Notifications You must be signed in to change notification settings

dkekre21/data-science-portfolio

Repository files navigation

data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Codes are presented in the form of Jupyter notebooks and R markdown files (published at RPubs).

Contents

  • Machine Learning

    • Customer segmentation and CLV Identifies customer types for a durable goods company by analysing the needs and wants. Implements Customer LifetTime Value (CLTV) in order to distinguish customer based on their potential lifetime profits, to aid customer relationship management.
    • BestBuy propensity model Propensity model for ranking bestbuy customers who have high propensity to buy geek squad protection plan. The data is collected only from Santa Clara Best Buy stores. The dataset, named “BestBuy.csv” includes 3,206 transactions made in March 2017. Please note this is very small dataset and provides analysis for a beginner ML enthusiast.
    • More projects to be uploaded soon

    Toolkit : Python, R, ggplot, matplotlib

  • Deep Learning

    • Machine translation using LSTM

    Toolkit : Python, GCP, keras, nltk, matplotlib

  • Data Analysis and Visualization

    • World Bank Data Analysis 1991-2007 Compares top 3 populated countries of the world are differentiated in the standard of living and sustenance. India, China & United States are the three countries with highest population and interestingly they are at different steps of nation development.
    • Lending Club Loan Analysis 2005-2017 Visualize meaningful patterns and insights from lending club loan data and find fraud patterns.

    Toolkit : Python, plotly, matplotlib, seaborn

  • Time Series

    • Forecasting Market Volatility Predicts VIX futures prices using different time series models. Models are compared on RMSE score and daily Profit and Loss score. The lowest RMSE and highest money making model is winner.

    Toolkit: Python, Univariate time series, multivariate time series, prediction and forecasting

  • Spark ML

    • Movie Recommendation Generates movie recommendation using similarity e.g. pearson, cosine, jaccard and regularized correlation using pyspark using a public dataset.
    • Spam filtering Classifies emails into spam or ham using pyspark.
  • NLP

About

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published