Skip to content

This project focuses on developing a Streamlit web application that allows users to predict a customer’s annual spend using specific input features. The underlying machine learning model utilizes the Multiple Linear Regression (MLR) algorithm for accurate spend estimation.

License

Notifications You must be signed in to change notification settings

quantumudit/Spend-Estimator

Repository files navigation

Project Logo


Developing a Machine Learning Web Application to Estimate Annual Spend of Customers using Python and Streamlit

built-with-love powered-by-coffee cc-nc-sa

OverviewPrerequisitesArchitectureDemoSupportLicense

Overview

The project aims to create a web application that enables end users to estimate the annual spend amount of a customer when certain features are provided.

The application utilizes a trained model that has been developed using historical data and by leveraging this model, users can input the values of different features to obtain an estimation of the customer annual spend.

The web application serves as a valuable tool for business owners to make informed decisions and plan their operations accordingly, taking into account the anticipated annual spend of customers.

Here is the snapshot of the web application interface:

app-snippet

The machine learning model employs the multiple linear regression (MLR) algorithm, utilizing four numerical features to predict the target variable.

Thorough evaluation reveals an adjusted R2 score of $0.99$ and an RMSE score of $10.48$. Prior to training, features are standardized using standard scalar normalization.

The project repository exhibits the following structure:

Spend-Estimator/
├── 📁.github
├── 📁.streamlit
├── 📁conf
├── 📁data/
│   ├── 📁raw
│   ├── 📁processed
│   ├── 📁test
│   └── 📁train 
├── 📁notebooks
├── 📁src/
│   ├── 📁components
│   ├── 📁pipelines
│   ├── 📁utils
│   ├── 🐍constants.py
│   ├── 🐍exception.py
│   └── 🐍logger.py
├── 📁models/
│   ├── 📁predictions
│   ├── 📁preprocessors
│   ├── 📁scores
│   └── 📁trained
├── 📁logs
├── 📁reports
├── 📁resources
├── 🐍main.py
├── 🐍app.py
├── 🐍template.py
├── 🔒poetry.lock
├── 📇pyproject.toml
├── 🗒️requirements.txt
├── 📜.gitignore
├── 🔑LICENSE
└── 📝README.md
💡 Repository Structure Details

To help you navigate through the project, here’s a concise guide to the repository’s structure, detailing what each directory contains and its purpose within the project:

  • 📁.github - Contains GitHub-related configuration files like workflows for CI/CD.
  • 📁.streamlit - Holds Streamlit-specific configuration files for web app settings.
  • 📁conf - Configuration files and schema for the project.
  • 📁data/
    • 📁raw - Original, unmodified data files.
    • 📁processed - Data that has been cleaned and transformed for analysis.
    • 📁test - Data sets used for testing the model's performance.
    • 📁train - Data sets used for training the machine learning models.
  • 📁notebooks - Jupyter notebooks for exploratory data analysis and model experimentation.
  • 📁src/
    • 📁components - Modular components used across the project.
    • 📁pipelines - Data processing and machine learning pipelines.
    • 📁utils - Utility scripts for common tasks throughout the project.
    • 🐍constants.py - Central file for constants used in the project.
    • 🐍exception.py - Custom exception classes for error handling.
    • 🐍logger.py - Logging configuration and setup.
  • 📁models/
    • 📁predictions - Output predictions from the model.
    • 📁preprocessors - Scripts for data preprocessing steps.
    • 📁scores - Model evaluation metrics and scoring information.
    • 📁trained - Serialized versions of trained models.
  • 📁logs - Contains auto-generated logs for event and error tracking, not included in Git.
  • 📁reports - Generated analysis reports and insights.
  • 📁resources - Additional resources like images or documents used in the project
  • 🐍main.py - Script to orchestrates the project's workflow. It sequentially executes the pipeline scripts
  • 🐍app.py - The Streamlit web application entry point.
  • 🐍template.py - Template script for standardizing code structure.
  • 🔒poetry.lock - Lock file for Poetry to ensure reproducible builds.
  • 📇pyproject.toml - Poetry configuration file for package management.
  • 🗒️requirements.txt - List of Python package requirements.
  • 📜.gitignore - Specifies intentionally untracked files to ignore.
  • 🔑LICENSE - The license file for the project.
  • 📝README.md - The introductory documentation for the project.

Prerequisites

Tech Stack Prerequisites

Python Numpy Pandas Matplotlib scikit-learn Streamlit

To effectively engage with this project, possessing a robust understanding of the skills listed below is advisable:

  • Core comprehension of Python, Machine Learning, and Modular programming
  • Acquaintance with libraries such as NumPy, Pandas, Matplotlib, Scikit-Learn, and Streamlit
  • Acquaintance with the Python libraries specified in the 🗒️requirements.txt document

These competencies will facilitate a seamless and productive journey throughout the project.

Development Environment Prerequisites

Anaconda Poetry VS_code Jupyter_Notebook Notepad_plus_plus Obsidian Figma Clickup

Application selection and setup may vary based on individual preferences and system setups.

The development tools I've employed for this project are:

  • Anaconda / Poetry: Utilized for distribution and managing packages
  • VS Code: Employed for writing and editing code
  • Jupyter Notebook: Used for data analysis and experimentation
  • Notepad++: Served as an auxiliary code editor
  • Obsidian: Utilized for documenting project notes
  • Figma: Used for crafting application UI/UX designs
  • Click Up: Employed for overseeing project tasks

Automation Integration Necessities

GitHubActions

Integrating process automation is entirely elective, as is the choice of the automation tool.

In this project, GitHub Actions has been selected to automate the machine learning model development process as needed.

Should there be a need to adjust hyperparameters or data-related settings, simply update the YAML configurations, and the entire development workflow can be executed directly from the repository.

Architecture

The architectural design of this project is transparent and can be readily comprehended with the assistance of the accompanying diagram illustrated below:

Process Architecture

The project's architectural framework encompasses the following key steps:

Model Development

The model development process begins with data collection, followed by cleaning and transforming the data to create a clean dataset. This dataset is then split into training and test sets.

After preparing a clean training dataset, we perform necessary imputations and standardization to prepare the scaled training set. This set is then used to train the machine learning model. The model development involves testing different hyperparameters and evaluating the model against the test dataset to optimize the ML model's performance.

Once optimized, the ML model is serialized and integrated into a web application, allowing end users to interact with it and receive predictions based on their input.

User Interaction

The user interaction with the web application is intuitive and user-friendly. Users begin by entering their data through a simple slider selection, where they can select values for various features that the model uses to make predictions.

This process is designed to be straightforward, even for those with little to no technical background, ensuring that the application is accessible to a wide audience.

Data Retrieval

Once the user submits their data, the web application processes the input using the serialized machine learning model.

The model quickly analyzes the input data, applies the necessary algorithms, and computes the predictions. This step is performed efficiently to minimize wait time and enhance the user experience.

User Output

The results are then presented to the user in a clear and understandable format in the web application.

The goal is to make the output as informative and helpful as possible, allowing users to make informed decisions based on the model's predictions.

Demo

The following illustration demonstrates the process of providing necessary inputs to the web application and the receiving the desired output:

webapp-graphic

Access the web application by clicking here: Customer Spend Estimator

Support

Should you wish to inquire, offer feedback, or propose ideas, don’t hesitate to contact me via the channels listed below:

Linkedin Badge Twitter Badge Gmail Badge

Discover and engage with my content on these platforms:

Linktree Badge Youtube Badge GitHub Badge Medium Badge Substack Badge

To express your support for my work, consider buying me a coffee or, donate through Paypal

Buy Me a Coffee Paypal

License

by-nc-sa

This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.


topmate-udit


About

This project focuses on developing a Streamlit web application that allows users to predict a customer’s annual spend using specific input features. The underlying machine learning model utilizes the Multiple Linear Regression (MLR) algorithm for accurate spend estimation.

Topics

Resources

License

Stars

Watchers

Forks