Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to use cross validation when training PyAF models #105

Closed
antoinecarme opened this issue Sep 25, 2018 · 10 comments
Closed

Comments

@antoinecarme
Copy link
Owner

antoinecarme commented Sep 25, 2018

Following the investigation performed in #53, implement a form of cross validation for PyAF models.

Specifications :

  1. Cut the dataset in many folds according to a scikit-learn time series split :
    http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation
    number of folds => user option (default = 10)

  2. To have enough data, use only the last n/2 folds for estimating the models (thanks to forecast R package ;). The default splits look like this :
    [5 ] [6]
    [5 6 ] [7]
    [5 6 7] [8]
    [5 6 7 8] [9]
    [5 6 7 8 9] [10]

  3. Use the model decomposition type or formula as a hyperparameter and optimize it. select the decomposition(s) with the lowest mean MAPE on the validation datasets of all the possible splits.

  4. Among all the chosen decompositions, select the model with lowest complexity (~ number of inputs)

  5. Execute the procedure on the ozone and air passengers datsets and compare with the non-cross validation models (=> 2 jupyter notebooks)

@antoinecarme
Copy link
Owner Author

antoinecarme commented Sep 25, 2018

Classical PyAF modeling is a special case of this cross validation with 1 split (nfolds =5 , split = [1 2 3 4] [5] ). So the implementation should be made by adapting the existing code. Training each one of the splits is equivalent to training an old model.

@antoinecarme antoinecarme self-assigned this Sep 25, 2018
@NowanIlfideme
Copy link

Hi, I've been watching your project for a while (mostly - I have been working on a similar project, which comes at this from a different perspective 😛).
I'd just like to note that, from the business case, there are (at least) 2 different kinds of time series CV: with and without retraining on the set. The first one (that you've described above) is useful for settings where you can constantly re-train your model. The second one is for when you don't have the ability to re-train, but want to know what it will do on future, shorter folds. This is relevant for models with hidden components (e.g. ARIMA, state-space models, RNN's, ...) where the state can be much different when starting later than from the beginning (as an analogy, a Markov chain that isn't yet in the stationary distribution).

@antoinecarme
Copy link
Owner Author

@NowanIlfideme

Thanks a lot for your interest in PyAF. Comments like these a re always welcome. Hope you enjoyed.

Models with state/hidden components are not yet supported but if you look closely, PyAF is always evolving, Cross validation work started a year ago, its first implementation will be available in the few coming weeks.

Can you please elaborate a little bit more on the second case (python example in a gist ?). Any docs/references ?

@NowanIlfideme
Copy link

I don't quite have the time to make a full example, I hope a block thing will work. :)

Full Set:
[1 2 ... N N+1 ... 2N]

Train (same for all):
[1 2 3 ... N]

Validation:
Sees [1 ... N], predicts [N+1]
Sees [2 ... N+1], predicts [N+2]
...
Sees [N-1 ... 2N-1], predicts [2N]

If you only use stateless models, this is the same as validating on sets [N+1, ... 2N]. However, for stateful models, this means you will always be using [N*num_per_set] steps to "warm up" your model, and thus get consistent behavior (you'd do this in production, as well).

As an alternative, you could use the following scheme for stateless models as well:

Trains on [1 ... N], predicts [N+1]
Trains on [2 ... N+1], predicts [N+2]
...
Trains on [N-1 ... 2N-1], predicts [2N]

This will always give a "window", and again be consistent. However, the end use of these methods is different. 😃

@antoinecarme
Copy link
Owner Author

The block thing is clear and very interesting ;). Will keep this aside for implementing support for stateful models.

Do you have any book reference for this kind of stuff ? putting time series models in production etc.

@NowanIlfideme
Copy link

I'm going mainly by experience, sorry that I can't give any written reference. Cheers!

@antoinecarme
Copy link
Owner Author

Cheers!

@antoinecarme
Copy link
Owner Author

@NowanIlfideme

What about summarizing your experience in a github repository (markdown) ? I am also not aware of a written reference for this kind of stuff. Please think of this when you have some time.

Thanks a lot.

antoinecarme added a commit that referenced this issue Sep 26, 2018
antoinecarme added a commit that referenced this issue Sep 26, 2018
antoinecarme added a commit that referenced this issue Sep 26, 2018
antoinecarme added a commit that referenced this issue Sep 26, 2018
…105

Added separate cSignalDecompositionTrainer and cSignalDecompositionTrainer_CrossValidation
antoinecarme added a commit that referenced this issue Sep 26, 2018
@antoinecarme
Copy link
Owner Author

antoinecarme commented Sep 26, 2018

This is how to adapt the training process to activate the cross validation in PyAF (with 7 folds) :

    import pyaf.ForecastEngine as autof
    lEngine = autof.cForecastEngine()
    lEngine.mOptions.mCrossValidationOptions.mMethod = "TSCV";
    lEngine.mOptions.mCrossValidationOptions.mNbFolds = 7
    lEngine.train(ozone_dataframe , 'Month' , 'Ozone', 12);
    lEngine.getModelInfo();

antoinecarme added a commit that referenced this issue Sep 27, 2018
antoinecarme added a commit that referenced this issue Sep 27, 2018
…105

Added a jupyter notebook with air passengers case
antoinecarme added a commit that referenced this issue Sep 27, 2018
Add the possibility to use cross validation when training PyAF models #105
@antoinecarme
Copy link
Owner Author

FIXED!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants