Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Refactor trusted #338

Merged
merged 9 commits into from
Apr 6, 2023
Merged

Conversation

BenjaminBossan
Copy link
Collaborator

See discussion:

https://discord.com/channels/879548962464493619/1047505653603774584/1085550033207836783

Description

Right now, all nodes are always initialized with trusted=False, so they are not "aware" of what is trusted and what isn't (which means the argument might as well not exist). This refactor would pass the actual trusted argument the user provides to the nodes. As a consequence, the constructed tree is aware of what nodes are considered trusted or not.

As a consequence, when calling node.get_unsafe_set, the returned set takes the trusted argument into consideration. Before this change, it would return all untrusted types, even if the user explicitly trusted them via the trusted argument. As a consequence, audit_tree now no longer requires the trusted argument, as the tree already is aware.

A nice benefit of this change is that now, visualize can take a trusted argument, which is passed to the tree and is taken into account when visiting the nodes and determining if they can be trusted.

See discussion:

https://discord.com/channels/879548962464493619/1047505653603774584/1085550033207836783

Right now, all nodes are always initialized with trusted=False, so they
are not "aware" of what is trusted and what isn't (which means the
argument might as well not exist). This refactor would pass the actual
trusted argument the user provides to the nodes. As a consequence, the
constructed tree is aware of what nodes are considered trusted or not.

Some unit tests are still failing, so this is WIP.
@BenjaminBossan
Copy link
Collaborator Author

RFC @skops-dev/maintainers

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, otherwise I like the change.

skops/io/_audit.py Outdated Show resolved Hide resolved
}
self.func_name = state["content"]["func"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this out of children? you have previously argued they should all be in there, and I agree with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My argument was that all children values should be specific types: Nodes, list of nodes, etc. This is a str, hence I moved it out. The problem with having str here is that when get_unsafe_set is called, this fails. We just haven't covered this case, which is why it didn't come up. I can move it back into children, but then we need to change get_unsafe_type.

(I still plan to do a PR to test the types of children more rigorously, this may not be the last bug)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then we want to make sure it's a string? like, cast it to str maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #340 merged, we now allow str to be set as a child (with no further check), so I reverted this change.

skops/io/_general.py Outdated Show resolved Hide resolved
skops/io/_visualize.py Show resolved Hide resolved
- make all but 1st arg to visualize kw only
- don't set default for trusted in get_tree
- use PRIMITIVE_TYPE_NAMES
Copy link
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed your points, please review again.

skops/io/_audit.py Outdated Show resolved Hide resolved
}
self.func_name = state["content"]["func"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My argument was that all children values should be specific types: Nodes, list of nodes, etc. This is a str, hence I moved it out. The problem with having str here is that when get_unsafe_set is called, this fails. We just haven't covered this case, which is why it didn't come up. I can move it back into children, but then we need to change get_unsafe_type.

(I still plan to do a PR to test the types of children more rigorously, this may not be the last bug)

skops/io/_general.py Outdated Show resolved Hide resolved
skops/io/_visualize.py Show resolved Hide resolved
BenjaminBossan added a commit to BenjaminBossan/skops that referenced this pull request Apr 5, 2023
Two measures to harden the auditing (a little bit):

- Type annotate the Node's children to prevent setting invalid types.
- Change all the tests that use loads to only load trusted types instead
  of using trusted=True

The latter is importent because when setting trusted=True, the whole
machinery of checking types is not executed, so any bugs that may be
contained there will not be revealed. In particular, this shows that for
persisting methods, we had a child with a str type and that would raise
an error, i.e. loading method types was not possible for users who
passed trusted!=True.

Additional changes

As a consequence of the last point, the auditing code has been changed
to accept str as type. Alternatively, we can make the change explained
here:

skops-dev#338 (comment)

i.e. not storing the method name in children.

Another "victim" of this change is that the so far dead code of checking
for primitive types inside of get_unsafe_set has been removed. This code
was supposed to check if the type is a primitive type but it was
defective. get_module(child) would raise an error if an instance of the
type would be passed. We could theoretically fix that code, but it would
still be dead code because primitive types are stored as json.

Another small change is to exclude the code in skops/io/old from mypy
checks. Otherwise, we would have to update its type signatures if
signatures in the persistence code change.
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

}
self.func_name = state["content"]["func"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then we want to make sure it's a string? like, cast it to str maybe?

@BenjaminBossan BenjaminBossan marked this pull request as ready for review April 6, 2023 12:35
@adrinjalali adrinjalali changed the title Refactor trusted MNT Refactor trusted Apr 6, 2023
@adrinjalali adrinjalali merged commit be7e555 into skops-dev:main Apr 6, 2023
@BenjaminBossan BenjaminBossan deleted the refactor-trusted branch April 6, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants