GitHub - quantumudit/Spend-Estimator: This project focuses on developing a Streamlit web application that allows users to predict a customer’s annual spend using specific input features. The underlying machine learning model utilizes the Multiple Linear Regression (MLR) algorithm for accurate spend estimation.

Developing a Machine Learning Web Application to Estimate Annual Spend of Customers using Python and Streamlit

Overview • Prerequisites • Architecture • Demo • Support • License

Overview

The project aims to create a web application that enables end users to estimate the annual spend amount of a customer when certain features are provided.

The application utilizes a trained model that has been developed using historical data and by leveraging this model, users can input the values of different features to obtain an estimation of the customer annual spend.

The web application serves as a valuable tool for business owners to make informed decisions and plan their operations accordingly, taking into account the anticipated annual spend of customers.

Here is the snapshot of the web application interface:

The machine learning model employs the multiple linear regression (MLR) algorithm, utilizing four numerical features to predict the target variable.

Thorough evaluation reveals an adjusted R2 score of $0.99$ and an RMSE score of $10.48$. Prior to training, features are standardized using standard scalar normalization.

The project repository exhibits the following structure:

Spend-Estimator/
├── 📁.github
├── 📁.streamlit
├── 📁conf
├── 📁data/
│   ├── 📁raw
│   ├── 📁processed
│   ├── 📁test
│   └── 📁train 
├── 📁notebooks
├── 📁src/
│   ├── 📁components
│   ├── 📁pipelines
│   ├── 📁utils
│   ├── 🐍constants.py
│   ├── 🐍exception.py
│   └── 🐍logger.py
├── 📁models/
│   ├── 📁predictions
│   ├── 📁preprocessors
│   ├── 📁scores
│   └── 📁trained
├── 📁logs
├── 📁reports
├── 📁resources
├── 🐍main.py
├── 🐍app.py
├── 🐍template.py
├── 🔒poetry.lock
├── 📇pyproject.toml
├── 🗒️requirements.txt
├── 📜.gitignore
├── 🔑LICENSE
└── 📝README.md

💡 Repository Structure Details

To help you navigate through the project, here’s a concise guide to the repository’s structure, detailing what each directory contains and its purpose within the project:

📁.github - Contains GitHub-related configuration files like workflows for CI/CD.
📁.streamlit - Holds Streamlit-specific configuration files for web app settings.
📁conf - Configuration files and schema for the project.
📁data/
- 📁raw - Original, unmodified data files.
- 📁processed - Data that has been cleaned and transformed for analysis.
- 📁test - Data sets used for testing the model's performance.
- 📁train - Data sets used for training the machine learning models.
📁notebooks - Jupyter notebooks for exploratory data analysis and model experimentation.
📁src/
- 📁components - Modular components used across the project.
- 📁pipelines - Data processing and machine learning pipelines.
- 📁utils - Utility scripts for common tasks throughout the project.
- 🐍constants.py - Central file for constants used in the project.
- 🐍exception.py - Custom exception classes for error handling.
- 🐍logger.py - Logging configuration and setup.
📁models/
- 📁predictions - Output predictions from the model.
- 📁preprocessors - Scripts for data preprocessing steps.
- 📁scores - Model evaluation metrics and scoring information.
- 📁trained - Serialized versions of trained models.
📁logs - Contains auto-generated logs for event and error tracking, not included in Git.
📁reports - Generated analysis reports and insights.
📁resources - Additional resources like images or documents used in the project
🐍main.py - Script to orchestrates the project's workflow. It sequentially executes the pipeline scripts
🐍app.py - The Streamlit web application entry point.
🐍template.py - Template script for standardizing code structure.
🔒poetry.lock - Lock file for Poetry to ensure reproducible builds.
📇pyproject.toml - Poetry configuration file for package management.
🗒️requirements.txt - List of Python package requirements.
📜.gitignore - Specifies intentionally untracked files to ignore.
🔑LICENSE - The license file for the project.
📝README.md - The introductory documentation for the project.

Prerequisites

Tech Stack Prerequisites

To effectively engage with this project, possessing a robust understanding of the skills listed below is advisable:

Core comprehension of Python, Machine Learning, and Modular programming
Acquaintance with libraries such as NumPy, Pandas, Matplotlib, Scikit-Learn, and Streamlit
Acquaintance with the Python libraries specified in the 🗒️requirements.txt document

These competencies will facilitate a seamless and productive journey throughout the project.

Development Environment Prerequisites

Application selection and setup may vary based on individual preferences and system setups.

The development tools I've employed for this project are:

Anaconda / Poetry: Utilized for distribution and managing packages
VS Code: Employed for writing and editing code
Jupyter Notebook: Used for data analysis and experimentation
Notepad++: Served as an auxiliary code editor
Obsidian: Utilized for documenting project notes
Figma: Used for crafting application UI/UX designs
Click Up: Employed for overseeing project tasks

Automation Integration Necessities

Integrating process automation is entirely elective, as is the choice of the automation tool.

In this project, GitHub Actions has been selected to automate the machine learning model development process as needed.

Should there be a need to adjust hyperparameters or data-related settings, simply update the YAML configurations, and the entire development workflow can be executed directly from the repository.

Architecture

The architectural design of this project is transparent and can be readily comprehended with the assistance of the accompanying diagram illustrated below:

The project's architectural framework encompasses the following key steps:

Model Development

The model development process begins with data collection, followed by cleaning and transforming the data to create a clean dataset. This dataset is then split into training and test sets.

After preparing a clean training dataset, we perform necessary imputations and standardization to prepare the scaled training set. This set is then used to train the machine learning model. The model development involves testing different hyperparameters and evaluating the model against the test dataset to optimize the ML model's performance.

Once optimized, the ML model is serialized and integrated into a web application, allowing end users to interact with it and receive predictions based on their input.

User Interaction

The user interaction with the web application is intuitive and user-friendly. Users begin by entering their data through a simple slider selection, where they can select values for various features that the model uses to make predictions.

This process is designed to be straightforward, even for those with little to no technical background, ensuring that the application is accessible to a wide audience.

Data Retrieval

Once the user submits their data, the web application processes the input using the serialized machine learning model.

The model quickly analyzes the input data, applies the necessary algorithms, and computes the predictions. This step is performed efficiently to minimize wait time and enhance the user experience.

User Output

The results are then presented to the user in a clear and understandable format in the web application.

The goal is to make the output as informative and helpful as possible, allowing users to make informed decisions based on the model's predictions.

Demo

The following illustration demonstrates the process of providing necessary inputs to the web application and the receiving the desired output:

Access the web application by clicking here: Customer Spend Estimator

Support

Should you wish to inquire, offer feedback, or propose ideas, don’t hesitate to contact me via the channels listed below:

Discover and engage with my content on these platforms:

To express your support for my work, consider buying me a coffee or, donate through Paypal

License

This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Developing a Machine Learning Web Application to Estimate Annual Spend of Customers using Python and Streamlit

Overview