Shipping and delivering to a place near you
Author: Ruturaj Kiran Vaidya
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
Project based on the cookiecutter data science project template. #cookiecutterdatascience
If you are only interested in looking at the notebook then go to (There are notebook rendering problems in github ecosystem):
All the graphs are plotted using matplotlib
.
- Load the dataset and do feature engineering
- Select a product based on analysis
- Determine demand for that product using time-series forecast models
Dataset: https://www.kaggle.com/felixzhao/productdemandforecasting
This dataset contains product information.
First, I imported the dataset and I selected "Product_1359", as this is the top product (i.e. having the highest count, or data). Then I cleaned up the dataset (feature engineering) and only selected "Date" and "Order_Demand" for the product Product_1359. Other columns don't give useful information for our analysis. I then plotted various visualizations, to get to know the dataset. Following graphs show order demand for the product (daily and monthly), as first graph is kind of chaotic, I sum up the order demand per month and decided to use that, to get the clear picture for further analysis. I also removed the data of Jan 31 of 2017, as there were fewer values.
Next, it is importatnt to reason that the data is stationary. This is known as stationarity of the dataset. It is important for the model training (I actally found this in a youtube video). I used rolling mean and standard deviation methods (actually these methods show mean and deviation, and we can determine the stationarity by looking at the graph - there is a video which describes this - https://www.youtube.com/watch?v=e8Yw4alG16Q&t=1245s, but I think I should do more rearch on this in the future) as well as Dickey-Fuller method to determine the stationarity. Following figure shows the graph of rolling mean and standard deviation.
The following decomposition graph visualizes general trends
Lastly, I used ARIMA model for forecasting. I splitted product data into training and testing parts (roughly 80-20%), to verity the forecasting of the model.
In future, I decided to look more into these algorithms and methods, I also decided to verify some assumption I made based on online videos, articles, etc. I think that I should look more into ACF and PACF graphs, which are used to determine the values of p,q and r (this is a good think to know). I also think that I learned a lot from this project.
MIT