Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit universal resolver's robustness with respect to false positives/negatives with routines like "are these two marker expressions disjoint" #5562

Closed
BurntSushi opened this issue Jul 29, 2024 · 2 comments
Labels
internal A refactor or improvement that is not user-facing

Comments

@BurntSushi
Copy link
Member

Right now, the universal resolver relies on an internal routine, marker::is_disjoint, to report whether two marker expressions could ever evaluate to true for the same marker environment. If they can't, then they are considered disjoint.

At present, we use this to determine whether to create forks between conflicting dependency specifications. For example:

a==1.0.0 ; sys_platform == 'linux'
a==2.0.0 ; sys_platform == 'windows'

It is indeed the case that sys_platform == 'linux' and sys_platform == 'windows' can literally never be active for the same marker environment, and thus they are considered disjoint. This in turn "allows" the universal resolver to fork. (Although we are very likely going to need to allow forks even in cases of non-disjointness, as explored in #4732.)

But, we do use disjointness checking elsewhere. For example, when creating a fork inside the resolver, we remove any dependencies whose marker expressions are disjoint with that fork's marker expression. This is because those dependencies can never appear in environments that use that fork. But if disjointness checking is wrong, then this could lead to incorrect results.

What is known is that our disjointness checking is wrong today. Specifically, I believe perfect disjointness checking is NP-complete. So right now, we only use heuristics. What is not quite known is whether its "wrongness" is limited to only false positives, limited to only false negatives, or is both. Moreover, I do not believe the callers of disjointness checking have been audited from what they are robust to. Sometimes false negatives are okay in the sense that they might just lead to not being able to produce a resolution, for example.

This is also somewhat shifting as mentioned via #4732, and also the work that @ibraheemdev is doing based on marker normalization.

@BurntSushi BurntSushi added the internal A refactor or improvement that is not user-facing label Jul 29, 2024
@ibraheemdev
Copy link
Member

ibraheemdev commented Jul 29, 2024

I believe we are going to end up in a place where marker satisfiability can have false positives but never false negatives, i.e. we might think a marker tree is satisfiable when it is actually not, but we never claim a marker tree is not satisfiable when it actually is. This is mostly due to the in operator on strings which can have interactions that are difficult to model (for example, key > 'z' and key in 'linux' is not satisfiable, but that is not easy to see), but there may be other places this is possible that we have not accounted for yet so it probably a good idea to remain pessimistic.

Similarly, the disjointness of two markers is equivalent to asking whether their conjunction is not satisfiable, so we should account for false negatives but not false positives. i.e. it is possible that we think two markers are not disjoint when they actually are (such as key > 'z' and key in 'linux'), but we should never claim that two markers are disjoint when there is a potential intersection.

@BurntSushi
Copy link
Member Author

I think we're all set here with @ibraheemdev's work in #5898.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal A refactor or improvement that is not user-facing
Projects
None yet
Development

No branches or pull requests

2 participants