Skip to content

Latest commit

 

History

History
152 lines (138 loc) · 8.5 KB

TODO.md

File metadata and controls

152 lines (138 loc) · 8.5 KB

Testing

  • Split tests by module and reorder
  • Add algebraic tests for test_calc_ewma
  • Add aggregation tests for test_calc_ewma and test_calc_statistical_features

Performance

  • Cache feature engineering results using lru.cache and profile to see if it makes a difference
  • Clean-up warnings
  • Update to_pandas, get_sample, and utilities functions to accept mxnet and Ray dataframes

Transform

  • Split out two separate self.transforms to avoid mixing the sample and the data transforms. Self.transforms may not even be necessary anymore since cross_validate_lgbms stores the dicts locally
  • Add a new "method" arg to encode_categoricals to allow the user to specifiy different strategies (e.g., get_dummies)

Feature engineering

  • Write a function that flags consecutive days of non-sales
  • Research enhancements in mlfin package from Advances in Financial Machine Learning
  • Consider splitting out the crossover and momentum sections of calc_statistical_features and calc_ewma to reduce complexity. Current build may be faster than splitting the function separately due to matrix operations.
  • calc_ewma, calc_percent.., and calc_statistical... could all be refactored; they share many moving parts.
  • Ability to detrend features by different levels of the hierarchy (e.g., store/item sales divided by store slaes)
  • Kurtosis and quantile features
  • Innovation state space model features
  • Shift as used in test__run_feature_engineering doesn't shift by day, but lag does correctly roll by day. this may cause minor issues if you don't fill your dataframe.
  • Add noise to features
    • noise to time-static features

      for col in [c for c in X.columns if 'store' in c and 'ratio' in c]:

      X[col] = X[col] + np.random.normal(0, 0.1, len(X))

      print('adding noise to {}'.format(col))

  • Add new product forecasting features from https://hbswk.hbs.edu/item/how-do-you-predict-demand-and-set-prices-for-products-never-sold-before
  • Add de-trended sales for each series, i.e. quantity sold divided by average quantity sold for that store, to capture item-level trends.
  • Rolling averages by over time (by month, by week)
  • Add quantiles
  • Scale sales by dividing sales by store growth, including new rolling averages
  • basic moving averages, after removing any store trends

Docs / README

  • Add links to examples on README
  • Add TOC to docs
  • Set expectations for features like inventory, etc. (greatexpectations.io)
  • Add package to pip
  • Fix setup.py and distribute
  • Need to add example notebook link to README

Modeling

Interpretability

  • Move error metric graphs from app to forecastframe
  • Add feature to identify hierarchy levels (e.g., categories) that consistently over or under predict ("75% of cross-validation weeks were underpredicted for this category")
  • Pass arg to make features orthogonal prior to feature importance
  • Find a way around altair dataset size restrictions (see m5_example)
  • The graphs in m5_example need work
  • Add shap plots to package
  • Needs better tests
  • Being able to cluster products across levels of the hierarchy (given a keto trend in the West Coast, here are some other products in related categories that may be worth carrying)