Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIGS print and plot return different trees #132

Closed
mepland opened this issue Sep 13, 2022 · 1 comment · Fixed by #134
Closed

FIGS print and plot return different trees #132

mepland opened this issue Sep 13, 2022 · 1 comment · Fixed by #134

Comments

@mepland
Copy link
Collaborator

mepland commented Sep 13, 2022

In imodels_demo.ipynb the Tree #0 returned by printing the fitted model:

Glucose concentration test <= 99.500 (Tree #0 root)
	Val: 0.068 (leaf)
	Glucose concentration test <= 168.500 (split)
		#Pregnant <= 6.500 (split)
			Body mass index <= 30.850 (split)
				Val: 0.065 (leaf)
				Blood pressure(mmHg) <= 67.000 (split)
					Val: 0.705 (leaf)
					Val: 0.303 (leaf)
			Val: 0.639 (leaf)
		Blood pressure(mmHg) <= 93.000 (split)
			Val: 0.860 (leaf)
			Val: -0.009 (leaf)

and plotting:

figs

do not agree!

Based on _tree_to_str_with_data, which agrees with the simpler _tree_to_str actually being called here - see below, the first line printed after a split is the left / true branch, while the second line after the split is the right / false branch.

Reading the printed version, after the first "Glucose concentration test <= 99.500" split, there should be a leaf with value 0.068 for <= 99.5, and then the "Glucose concentration test <= 168.500" split for > 99.5, but this is not the structure of the plotted tree. Also, note that both of the "Blood pressure(mmHg)" splits should end in leaves, while in the plot one of them leads to a "#Pregnant" split.

The output of print(figs.print_tree(X_train, y_train)) is shown below for reference:

Glucose concentration test <= 99.500 65/192 (33.85%)
	ΔRisk = 0.07 4/59 (6.78%)
	Glucose concentration test <= 168.500 61/133 (45.86%)
		#Pregnant <= 6.500 44/112 (39.29%)
			Body mass index <= 30.850 21/76 (27.63%)
				ΔRisk = 0.06 2/31 (6.45%)
				Blood pressure(mmHg) <= 67.000 19/45 (42.22%)
					ΔRisk = 0.71 10/14 (71.43%)
					ΔRisk = 0.30 9/31 (29.03%)
			ΔRisk = 0.64 23/36 (63.89%)
		Blood pressure(mmHg) <= 93.000 17/21 (80.95%)
			ΔRisk = 0.86 17/19 (89.47%)
			ΔRisk = -0.01 0/2 (0.0%)

Also see my stripped down notebook demonstrating the issue here.

@csinva
Copy link
Owner

csinva commented Sep 13, 2022

Thank you for finding this issue! You are correct these do not match and something is amiss...I would guess the issue is with the visualization code not the print_tree method.

@OmerRonen Have you seen this disagreement before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants