- Split tests by module and reorder
- Add algebraic tests for test_calc_ewma
- Add aggregation tests for test_calc_ewma and test_calc_statistical_features
- Cache feature engineering results using lru.cache and profile to see if it makes a difference
- Clean-up warnings
- Update to_pandas, get_sample, and utilities functions to accept mxnet and Ray dataframes
- Split out two separate self.transforms to avoid mixing the sample and the data transforms. Self.transforms may not even be necessary anymore since cross_validate_lgbms stores the dicts locally
- Add a new "method" arg to encode_categoricals to allow the user to specifiy different strategies (e.g., get_dummies)
- Write a function that flags consecutive days of non-sales
- Research enhancements in mlfin package from Advances in Financial Machine Learning
- Consider splitting out the crossover and momentum sections of calc_statistical_features and calc_ewma to reduce complexity. Current build may be faster than splitting the function separately due to matrix operations.
- calc_ewma, calc_percent.., and calc_statistical... could all be refactored; they share many moving parts.
- Ability to detrend features by different levels of the hierarchy (e.g., store/item sales divided by store slaes)
- Kurtosis and quantile features
- Innovation state space model features
- Shift as used in test__run_feature_engineering doesn't shift by day, but lag does correctly roll by day. this may cause minor issues if you don't fill your dataframe.
- Add noise to features
- Add new product forecasting features from https://hbswk.hbs.edu/item/how-do-you-predict-demand-and-set-prices-for-products-never-sold-before
- Add de-trended sales for each series, i.e. quantity sold divided by average quantity sold for that store, to capture item-level trends.
- Rolling averages by over time (by month, by week)
- Add quantiles
- Scale sales by dividing sales by store growth, including new rolling averages
- basic moving averages, after removing any store trends
- Add links to examples on README
- Add TOC to docs
- Set expectations for features like inventory, etc. (greatexpectations.io)
- Add package to pip
- Fix setup.py and distribute
- Need to add example notebook link to README
- Add easy prediction capability with ensembling and add to docs
- Add recursiving training functionality (low priority since this concept can cause cascading errors)
- Build multi-quantile model using _get_quantile_weights
- https://github.com/Mcompetitions/M5-methods/blob/master/Code%20of%20Winning%20Methods/A1/3.%20code/2.%20train/1-1.%20recursive_store_TRAIN.ipynb
- https://github.com/Mcompetitions/M5-methods/blob/master/Code%20of%20Winning%20Methods/A1/3.%20code/2.%20train/1-2.%20recursive_store_cat_TRAIN.ipynb
- https://github.com/Mcompetitions/M5-methods/blob/master/Code%20of%20Winning%20Methods/A1/3.%20code/3.%20predict/1-1.%20recursive_store_PREDICT.py
- Review additional M5 code for suggestions
-
Quantile modeling: https://github.com/Mcompetitions/M5-methods/blob/master/Code%20of%20Winning%20Methods/U1/quantiles_kaggle.ipynb
-
Add PurgedKFold classes
- Move error metric graphs from app to forecastframe
- Add feature to identify hierarchy levels (e.g., categories) that consistently over or under predict ("75% of cross-validation weeks were underpredicted for this category")
- Pass arg to make features orthogonal prior to feature importance
- Find a way around altair dataset size restrictions (see m5_example)
- The graphs in m5_example need work
- Add shap plots to package
- Needs better tests
- Being able to cluster products across levels of the hierarchy (given a keto trend in the West Coast, here are some other products in related categories that may be worth carrying)