New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[DOCS] Add feature importance to classification example #1359

Merged

lcawl merged 4 commits into elastic:master from lcawl:feature-importance

Oct 27, 2020

Contributor

lcawl commented Sep 10, 2020 •

edited

Loading

Related to elastic/kibana#73561

This PR drafts changes to the classification example such that it includes feature importance explanations.

It will be backported to 7.10 and does not take into consideration changes in 7.11 and later for elastic/kibana#77874

Preview

https://stack-docs_1359.docs-preview.app.elstc.co/guide/en/machine-learning/master/flightdata-classification.html

lcawl added :ml v8.0.0 v7.10.0 labels

lcawl changed the title ~~[DOCS] Add feature importance examples~~ [DOCS] Add feature importance to classification example

lcawl force-pushed the feature-importance branch from 8960791 to 4ba1595 Compare

September 16, 2020 00:12

valeriy42 reviewed

View reviewed changes

docs/en/stack/ml/df-analytics/dfa-classification.asciidoc Outdated

-              in your destination index. See the
-              {ml-docs}/flightdata-classification.html#flightdata-classification-results[Viewing {classification} results]
-              section in the {classification} example.
+              in your destination index.
               [[dfa-classification-class-score]]
               === `class_score`
               The value of `class_score` controls the probability at which a class label is

Contributor

valeriy42 Sep 16, 2020

class_score is definitely not a probability, since if I choose k very very small, class_score may be arbitrary large, while probability is always between 0 and 1. It's better to call it a "likelihood". And also it doesn't "control" it but simple "shows". It is controlled by the threshold k, which we estimate automagically based on class_assignment_objective configuration.

docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc Outdated

-              values. A higher number means that the model is more confident.
+              If you want to understand how certain the model is about each prediction, you
+              can examine its probability and score (`ml.prediction_probability` and
+              `ml.prediction_score`). These values range between 0 and 1; the higher the

Contributor

valeriy42 Sep 16, 2020

Strictly speaking, class_score can be larger than 1 in some degenerated cases. So it's defined as larger or equal to 0.

docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc Outdated

Comment on lines 426 to 427

		//Does this mean the sum of the feature importance values for false in this
		example should equal the logit(p), where p is the class_probability for false?

Contributor

valeriy42 Sep 16, 2020

This is correct up to a constant. There is also a datapoint-independent constant -- an average log-odd overall all training points, which we add to the sum of feature importance before taking the inverse-logit to compute the probabilities.

docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc Outdated

    
              any class.

              //Does this mean the sum of the feature importance values for false in this

              example should equal the logit(p), where p is the class_probability for false?

              //Does this imply that the feature importance value itself is the result of a logit function? Or that we use the function to merely represent the distribution of feature importance values?

Contributor

valeriy42 Sep 16, 2020

What happens is that the decision forest predicted the log-odds directly and then we compute feature importance on the log-odds values. When we evaluate a data point, we take the log-odds predicted by the decision forest and then apply the inverse of the logit function to get the class probability.

docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc Outdated

-              ====
+              While the probability of a class ranges between 0 and 1, its log-odds range
+              between negative and positive infinity. In {kib}, the decision path for each
+              class starts near zero, which represents a class probability of 0.5.

Contributor

valeriy42 Sep 16, 2020

This is unfortunately a bit more complicated: 0 would represent the class probability of a constant baseline. It relates to the average class probability for the selected class (in Kibana UI) over entire training set.

If you select Canceled as the target variable in flight data, this nuance becomes obvious. Since there are many more data points with Canceled = False, let's assume that the average class probability over the entire training set would be something like 0.92. This means that if the class probability of a data point is larger than 0.92 (for example 0.98) than the decision path will go to the right (sum of feature importances is positive). On the other hand, if the class probability is smaller than 0.92 (for example 0.84), than the decision path will go to the left (sum of feature importances is negative).

docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc Outdated

+              While the probability of a class ranges between 0 and 1, its log-odds range
+              between negative and positive infinity. In {kib}, the decision path for each
+              class starts near zero, which represents a class probability of 0.5.
+              // Is this true for multi-class classification or just binary classification?

Contributor

valeriy42 Sep 16, 2020

Yes, it's true for both multi-class and binary since in the UI you are selecting a class of interest from a drop-down menu.

lcawl force-pushed the feature-importance branch 2 times, most recently from 5d8027e to 691f8df Compare

September 29, 2020 16:19

lcawl force-pushed the feature-importance branch from 691f8df to cac81f1 Compare

September 30, 2020 19:20

lcawl added 3 commits

October 26, 2020 12:33


          [DOCS] Add feature importance to classification example

9a8b9fb


          [DOCS] Remove unnecessary change

e61d2a9


          Refreshes screenshot; removes comments

6947f68

lcawl force-pushed the feature-importance branch from ea7183a to 6947f68 Compare

October 26, 2020 21:21


          Revert changes to feature importance concept

b7a2dcd

lcawl marked this pull request as ready for review

October 26, 2020 21:26

lcawl added the v7.11 label

lcawl mentioned this pull request

[DOCS] Augment feature importance details for classification lcawl/stack-docs#4

Closed

lcawl merged commit 4752d11 into elastic:master

lcawl added a commit to lcawl/stack-docs that referenced this pull request


          [DOCS] Add feature importance to classification example (elastic#1359)

d3604e4

lcawl added a commit to lcawl/stack-docs that referenced this pull request


          [DOCS] Add feature importance to classification example (elastic#1359)

fa9ce42

This was referenced Oct 27, 2020

[7.x][DOCS] Add feature importance to classification example (#1359) #1428

Merged

[7.10][DOCS] Add feature importance to classification example (#1359) #1429

Merged

lcawl added a commit that referenced this pull request


          [DOCS] Add feature importance to classification example (#1359) (#1428)

d7526a5

lcawl added a commit that referenced this pull request


          [DOCS] Add feature importance to classification example (#1359) (#1429)

d4a1f71

lcawl mentioned this pull request

[DOCS] Augment feature importance details for classification #1469

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml v7.10.0 v7.11 v8.0.0