Skip to content

Latest commit

 

History

History
executable file
·
46 lines (28 loc) · 3.5 KB

README.md

File metadata and controls

executable file
·
46 lines (28 loc) · 3.5 KB

fastai2-tabular-interpretation

This is (extended) fastai2-version of my previous work This project helps you to interpret tabular models, made with fastai2

Some examples of using these methods are made for 2 datasets: well known Bulldozers dataset and transfermarkt's football players transfer statistics Corresponding interpretations are in bulldozer and football example notebooks.

Main interpretation methods available are:

  • Dendrogram -- can help to calculate and visualize features' correlations which can be used later

dendrogramm

  • Feature importance -- can help to calculate relative and visualize importance of isolated features as well as lists of correlated (connected) features, that were determined earlier

feature importance

  • Partial Dependence -- shows how particular value of a feature influence dependent variable. In what direction we should move this particular feature to minimase or maximize the result

partial dependence

  • Waterfall help to visualize how tabular model came to concluzion in the particular case. How and in what direction each feature value moves the dependent variable

waterfall chart

  • Embeddings -- this chapter helps to visualize embeddings calculated in the model

embeddings

These 5 chapters works nicely with an algorithm based on Jeremy Howard's article. In short:

  • We take some task (bulldozer's sales), make it's model (fastai tabular model creation).
  • Then we determine what features (feature importance) influence our value the most (let's say we want sell our bulldozer as high as possible).
  • Optionally dividing some features into groups (dendrogram).
  • Then we look at our task and find the features we can change in the real word from the top-important features (for example we can change in what state we sell our bulldozer or some other features, in fact I know nothing about bulldozers market in US :( )
  • After that we find the most useful for us value of this feature. In whole dataset (partial dependence) or in our particular case (waterfall). The last one also help us to determine what values drive price up or down the most.
  • Having this information and knowing what we can really change, we can optimize our bulldozer's sell price

This work is based on my previous notebook which in turn was based on Jeremy Howard's lectures. Also some parts of this work are inspired by Zachary Mueller's lectures especially tabular interpretation lesson

Restrictions: I've tested it for regression-based models only. Don't think it will work for classification without some refactoring