This a time series data project that applies all the concepts learnt in Time Series Analysis and Forecasting
Dataset • Dependencies • Evaluation & Splitting • Forecasting Methodologies • Comparison Table • Contributors
Air Passengers Dataset contains Number of Air passengers of each month from the year 1949 to 1960. We can use this data to forecast the future values and help the business.
Python 3.8.10
Pandas
NumPy
statsmodels
sklearn
fbprophet
xgboost
We chose RMSE as our evaluation metric an absolute error measure that squares the deviations to keep the positive and negative deviations from canceling one another out.
We tried different window sizes, and we noticed that:
- The larger the window size, the more data we lose from the beginning and the end of our timeseries data
- The window size determines the smoothness of the trend-cycle estimate.
- MA technique uses the past series values to forecast.
MA disadvantages are:
- Each historical value is given the same weight.
- It's hard to determine the window size (no. of periods used).
- Moreover, each n periods must be stored in the system.
The Naïve Forecasting is a method of predicting future data based on the last seen data, in this example we use the last value for the train to predict the future of the values following it.
It doesn't do as good as the previous model as this data is seasonal
Notice how it almost performed good in the following month but then wasn't reliable for the data after it. So it's not a very good approach when predicting for the far future.
# | Model | RMSE on Train-Test Split | RMSE on k-fold | RMSE on Roll Forward |
---|---|---|---|---|
1 | Simple Moving Average Smoothing | .. | .. | .. |