Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: bug when visualizing byte nodes #352

Merged

Conversation

BenjaminBossan
Copy link
Collaborator

The visualize function failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped).

To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.

The visualize function failed when trying to visualize models that had
byte attributes. This is because the node's children would contain raw
bytes, which the function doesn't know how to visualize. Therefore,
children of byte and byte array nodes are now skipped (in addition to
the node types already being skipped).

To test this bug, I added a visualization test to the external packages
like LightGBM, which make use of bytes. I'm not sure if some of the
sklearn estimators could also be candidates, but it would surely be
overkill to test visualizing all of them, whereas the overhead is not so
big for the external packages.
@BenjaminBossan
Copy link
Collaborator Author

@skops-dev/maintainers ready for review

I had to set the PYTHONIOENCODING env var for Windows tests to pass, no idea why it's necessary all of a sudden.

@BenjaminBossan BenjaminBossan changed the title Fix bug when visualizing byte nodes FIX: bug when visualizing byte nodes Apr 25, 2023
@BenjaminBossan BenjaminBossan added the bug Something isn't working label Apr 25, 2023
Truncate the number of bytes shown at 24.
Calling getvalue does not require a seek(0) afterwards.
@BenjaminBossan
Copy link
Collaborator Author

@adrinjalali I addressed your comment, please review again

assert_method_outputs_equal(estimator, loaded, X)

visualize(dumped, trusted=trusted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are going to print a bunch of stuff in the terminal when we run tests, shouldn't we be capturing the output? It also makes me realize, maybe we want to have an output arg for visualize?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are going to print a bunch of stuff in the terminal when we run tests

pytest automatically captures stdout unless used with the -s option, no?

maybe we want to have an output arg for visualize?

What would that argument do? We have the sink argument, in case someone wants to push the output to a logger, for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sink is not trivial to implement, it'd be nice if one could pass something like os.devnull and the output would be forwarded there.

I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there. If you think it's not worth fixing, I'm happy to merge as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sink is not trivial to implement, it'd be nice if one could pass something like os.devnull and the output would be forwarded there.

You should still be able to change sys.stdout, right?

I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there.

I see. For test_external.py, I made a change to not print anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is green, please review again

Comment on lines 57 to 58
def _null(*args, **kwargs):
# used to prevent printing anything to stdout when calling visualize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing this as sink is disabling a fair amount of code, not just printing. Are you sure you want this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true. The relevant part of the code is executed, though, i.e. the tests would fail without the bugfix. If you want me to still change it, I can think of something (I guess auto fixture that overrides sys.stdout, not sure if it conflicts with pytest).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but you're also specifically testing this bug anyway. So it'd be nice to have visualize to actually test the whole thing, in case later we add something which would break things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense, done.

More of the visualize stack is running this way, improving test
coverage.
@@ -136,6 +163,14 @@ class TestXGBoost:

"""

@pytest.fixture(autouse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And TIL about the redirect_stdout context manager but it doesn't work here :D (I guess because of pytest)

@adrinjalali adrinjalali merged commit dd0af39 into skops-dev:main Apr 27, 2023
@BenjaminBossan BenjaminBossan deleted the bugfix-visualize-bytes-nodes branch April 27, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants