Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle overlapping and incomplete markers when forking #4732

Closed
konstin opened this issue Jul 2, 2024 · 0 comments · Fixed by #5887
Closed

Handle overlapping and incomplete markers when forking #4732

konstin opened this issue Jul 2, 2024 · 0 comments · Fixed by #5887
Assignees
Labels
bug Something isn't working preview Experimental behavior

Comments

@konstin
Copy link
Member

konstin commented Jul 2, 2024

In python, a package can declare conflicting requirements as long as the markers are disjoint, e.g.:

numpy==2.0.0; python_version >= "3.11"
numpy==1.26.4; python_version < "3.11"

When we see requirements like that, we split the resolution into one for python_version >= "3.11" and one for python_version < "3.11" and solve each fork individually. This fails when the markers are overlapping or don't cover the entirety of the marker space. For a real-world example, opencv-python 4.10.0.84 used by transformers:

numpy >=1.13.3 ; python_version < "3.7"
numpy >=1.21.0 ; python_version <= "3.9" and platform_system == "Darwin" and platform_machine == "arm64"
numpy >=1.21.2 ; python_version >= "3.10"
numpy >=1.21.4 ; python_version >= "3.10" and platform_system == "Darwin"
numpy >=1.23.5 ; python_version >= "3.11"
numpy >=1.26.0 ; python_version >= "3.12"
numpy >=1.19.3 ; python_version >= "3.6" and platform_system == "Linux" and platform_machine == "aarch64"
numpy >=1.17.0 ; python_version >= "3.7"
numpy >=1.17.3 ; python_version >= "3.8"
numpy >=1.19.3 ; python_version >= "3.9"

Currently, we solve for the following forks (#4687):

python_version >= '3.12' and platform_machine == 'aarch64' and platform_system == 'Darwin' and platform_system == 'Linux'
python_version < '3.13' and python_version >= '3.12' and platform_machine == 'aarch64' and platform_system == 'Darwin' and platform_system == 'Linux'
python_version >= '3.13' and platform_machine == 'aarch64' and platform_system == 'Darwin' and platform_system == 'Linux'
python_version == '3.9' and platform_machine == 'arm64' and platform_system == 'Darwin'

They are disjoint, but they are also missing a good bit of the marker space for which we never solve.


Let's give working with markers some math-y foundations, a visualization as mental model and let's look at two examples.

Lemma: Inflating the requirements by artificially introducing a split. A requirement

foo

is equivalent to

foo; X
foo; not(X)

Similarly,

foo; Y

is equivalent to

foo; Y and X
foo; Y and not(X)

We can extend this to an arbitrary number of arguments, so if Z¹, Z², ..., Z^n are pairwise disjoint and the union over i in 0..n for Z^i is the base set omega, then

foo; Y

is equivalent to

foo; Y and Z¹
foo; Y and Z²
...
foo; Y and Z^n

Now assume we have the following set of partially pairwise overlapping (#4640), partially pairwise incomplete (#4687) markers:

foo¹>2
foo²>3; x>=5
foo³<4; x<7

We can now inflate the requirements by artificially introducing a split:

foo¹>2; (x>=7)
foo²>3; x>=5 and (x>=7)
foo³<4; x<7 and (x>=7)

foo¹>2; (x<7 and x>=5)
foo²>3; x>=5 and (x<7 and x>=5)
foo³<4; x<7 and (x<7 and x>=5)

foo¹>2; (x<5)
foo²>3; x>=5 and (x<5)
foo³<4; x<7 and (x<5)

We see that each group is disjoint, so we can create three forks. Simplifying and Removing empty markers we get:

# Fork 1
foo¹>2; x>=7
foo²>3; x>=7

# Fork 2
foo¹>2; x<7 and x>=5
foo²>3; x<7 and x>=5
foo³<4; x<7 and x>=5

# Fork 3
foo¹>2; x<5
foo³<4; x<5

We defined a correct way of splitting these markers.


Let us switch to another example to develop a model of markers and marker space. Ignoring extras, we can consider the current environment a point in a 10-dimensional marker space defined by PEP 508 (os_name, sys_platform, platform_machine, platform_python_implementation, platform_release, platform_system, platform_version, python_version/python_full_version, implementation_name, implementation_version). The dimensions are of different datatypes and not all points exist (os_name=="posix" and platform_system=="Windows" doesn't make sense). A marker is a slice in that space.

We can visualize a simpler version of the opencv-python 4.10.0.84 requirements:

numpy >=1.21.4 ; python_version >= "3.10" and platform_system == "Darwin"
numpy >=1.21.2 ; python_version >= "3.11"
numpy >=1.21.0 ; python_version <= "3.9" and platform_system == "Darwin"

image

There is two ways we can transform these requirements with the diagram:

  • Merge overlapping: 1 and 2 are overlapping, so there's a fork 1 or 2. There's a fork 3 and a fork for the remaining space not(1 or 2 or 3)
  • Split overlapping: 1 and 2 are overlapping so there's a fork 1 \ 2, 2 \ 1 and 1 and 2 . There's a fork 3 and a fork for the remaining space not(1 or 2 or 3)

image

For splitting overlapping, the options are:

  • crosses, python_version >= "3.10" and python_version < "3.11" and platform_system == "Darwin" with numpy >=1.21.4
  • stripes, python_version >= "3.11" and platform_system == "Darwin" with numpy >=1.21.2 and numpy >=1.21.4
  • dots, python_version >= "3.11" and platform_system != "Darwin" with numpy >=1.21.2

I favor to splitting the overlapping, it's the more general solution, and the one able to handle complex cases like the opencv-python markers (even though we pick a more recent numpy version for them anyway). If we merge overlapping, we need to special case the universal marker (no marker provided) for #4640. In either case, we need to implement marker tree negation to support incomplete markers, and proper simplification for negated marker tree to determine if the markers were actually incomplete or did cover the entire markers space, leaving only an empty set.

@konstin konstin added bug Something isn't working preview Experimental behavior labels Jul 2, 2024
konstin added a commit to konstin/packse that referenced this issue Jul 9, 2024
We're currently (astral-sh/uv#4732) failing to install `c` from this scenario:

```
$ echo "fork-incomplete-markers-8137fed5" | uv pip compile --index-url http://127.0.0.1:3141 -p 3.9 --universal -
Resolved 4 packages in 38ms
# This file was autogenerated by uv via the following command:
#    uv pip compile -p 3.9 --universal -
fork-incomplete-markers-8137fed5==0.0.0
fork-incomplete-markers-a-8137fed5==1.0.0 ; python_version < '3.10'
    # via fork-incomplete-markers-8137fed5
fork-incomplete-markers-a-8137fed5==2.0.0 ; python_version >= '3.11'
    # via fork-incomplete-markers-8137fed5
fork-incomplete-markers-b-8137fed5==1.0.0
    # via fork-incomplete-markers-8137fed5
```

This is one particular scenario that we need to test for solving astral-sh/uv#4732

I had to update the lockfile for the recent uv lockfile changes.
konstin added a commit to konstin/packse that referenced this issue Jul 9, 2024
We're currently (astral-sh/uv#4732) failing to install `c` from this scenario:

```
$ echo "fork-incomplete-markers-8137fed5" | uv pip compile --index-url http://127.0.0.1:3141 -p 3.9 --universal -
Resolved 4 packages in 38ms
# This file was autogenerated by uv via the following command:
#    uv pip compile -p 3.9 --universal -
fork-incomplete-markers-8137fed5==0.0.0
fork-incomplete-markers-a-8137fed5==1.0.0 ; python_version < '3.10'
    # via fork-incomplete-markers-8137fed5
fork-incomplete-markers-a-8137fed5==2.0.0 ; python_version >= '3.11'
    # via fork-incomplete-markers-8137fed5
fork-incomplete-markers-b-8137fed5==1.0.0
    # via fork-incomplete-markers-8137fed5
```

This is one particular scenario that we need to test for solving astral-sh/uv#4732

I had to update the lockfile for the recent uv lockfile changes.
konstin added a commit to astral-sh/packse that referenced this issue Jul 9, 2024
We're currently (astral-sh/uv#4732) failing to
install `c` from the added scenario:

```
$ echo "fork-incomplete-markers-8137fed5" | uv pip compile --index-url http://127.0.0.1:3141 -p 3.9 --universal -
Resolved 4 packages in 38ms
# This file was autogenerated by uv via the following command:
#    uv pip compile -p 3.9 --universal -
fork-incomplete-markers-8137fed5==0.0.0
fork-incomplete-markers-a-8137fed5==1.0.0 ; python_version < '3.10'
    # via fork-incomplete-markers-8137fed5
fork-incomplete-markers-a-8137fed5==2.0.0 ; python_version >= '3.11'
    # via fork-incomplete-markers-8137fed5
fork-incomplete-markers-b-8137fed5==1.0.0
    # via fork-incomplete-markers-8137fed5
```

This is one particular scenario that we need to test for solving
astral-sh/uv#4732

I had to update the lockfile for the recent uv lockfile changes, split
out to #199 which removes it from the diff here.
BurntSushi added a commit that referenced this issue Jul 25, 2024
When a fork occurs, we divide not just the dependencies that
provoked a fork into distinct groups, but we also add the
corresponding sibling dependencies to each fork. Previously,
while we track markers on the fork itself, the individual
dependencies that had markers only corresponded to markers
written from the dependency specification.

This meant that the sibling dependencies that got added to
each fork would not themselves have markers attached to them.
This in turn meant they would not have markers associated with
them in the lock file.

In many cases, this is actually okay, because the resolver will
pick a version that is "universal" across all forks in most
cases. But in some cases, this just simply isn't possible as
the marker expressions in the fork can and do influence resolution.
In which case, it is possible for the same package with different
versions to show up in the lock file unconditionally. Which is a
big no-no.

So in this commit, after we determine the forks, we intersect the
markers on each fork with each of its dependencies.

This does seem to balloon the marker expressions in some cases.
I plucked one low hanging fruit to avoid doing `x and x` in
trivial cases. (And this eliminated a portion of the snapshot
diffs.) But some pretty gnarly diffs remain.

This commit also fixes another bug: previously, when we created a fork
to capture the "remaining" universe of an incomplete set of markers, we
left out dependencies that should be included in that fork. We rectify
that here.

Fixes #5086

Partially addresses #4732
BurntSushi added a commit that referenced this issue Jul 25, 2024
When a fork occurs, we divide not just the dependencies that
provoked a fork into distinct groups, but we also add the
corresponding sibling dependencies to each fork. Previously,
while we track markers on the fork itself, the individual
dependencies that had markers only corresponded to markers
written from the dependency specification.

This meant that the sibling dependencies that got added to
each fork would not themselves have markers attached to them.
This in turn meant they would not have markers associated with
them in the lock file.

In many cases, this is actually okay, because the resolver will
pick a version that is "universal" across all forks in most
cases. But in some cases, this just simply isn't possible as
the marker expressions in the fork can and do influence resolution.
In which case, it is possible for the same package with different
versions to show up in the lock file unconditionally. Which is a
big no-no.

So in this commit, after we determine the forks, we intersect the
markers on each fork with each of its dependencies.

This does seem to balloon the marker expressions in some cases.
I plucked one low hanging fruit to avoid doing `x and x` in
trivial cases. (And this eliminated a portion of the snapshot
diffs.) But some pretty gnarly diffs remain.

This commit also fixes another bug: previously, when we created a fork
to capture the "remaining" universe of an incomplete set of markers, we
left out dependencies that should be included in that fork. We rectify
that here.

Fixes #5086

Partially addresses #4732
BurntSushi added a commit that referenced this issue Jul 25, 2024
When a fork occurs, we divide not just the dependencies that
provoked a fork into distinct groups, but we also add the
corresponding sibling dependencies to each fork. Previously,
while we track markers on the fork itself, the individual
dependencies that had markers only corresponded to markers
written from the dependency specification.

This meant that the sibling dependencies that got added to
each fork would not themselves have markers attached to them.
This in turn meant they would not have markers associated with
them in the lock file.

In many cases, this is actually okay, because the resolver will
pick a version that is "universal" across all forks in most
cases. But in some cases, this just simply isn't possible as
the marker expressions in the fork can and do influence resolution.
In which case, it is possible for the same package with different
versions to show up in the lock file unconditionally. Which is a
big no-no.

So in this commit, after we determine the forks, we intersect the
markers on each fork with each of its dependencies.

This does seem to balloon the marker expressions in some cases.
I plucked one low hanging fruit to avoid doing `x and x` in
trivial cases. (And this eliminated a portion of the snapshot
diffs.) But some pretty gnarly diffs remain.

This commit also fixes another bug: previously, when we created a fork
to capture the "remaining" universe of an incomplete set of markers, we
left out dependencies that should be included in that fork. We rectify
that here.

Fixes #5086

Partially addresses #4732
BurntSushi added a commit that referenced this issue Jul 26, 2024
When a fork occurs, we divide not just the dependencies that
provoked a fork into distinct groups, but we also add the
corresponding sibling dependencies to each fork. Previously,
while we track markers on the fork itself, the individual
dependencies that had markers only corresponded to markers
written from the dependency specification.

This meant that the sibling dependencies that got added to
each fork would not themselves have markers attached to them.
This in turn meant they would not have markers associated with
them in the lock file.

In many cases, this is actually okay, because the resolver will
pick a version that is "universal" across all forks in most
cases. But in some cases, this just simply isn't possible as
the marker expressions in the fork can and do influence resolution.
In which case, it is possible for the same package with different
versions to show up in the lock file unconditionally. Which is a
big no-no.

So in this commit, after we determine the forks, we intersect the
markers on each fork with each of its dependencies.

This does seem to balloon the marker expressions in some cases.
I plucked one low hanging fruit to avoid doing `x and x` in
trivial cases. (And this eliminated a portion of the snapshot
diffs.) But some pretty gnarly diffs remain.

This commit also fixes another bug: previously, when we created a fork
to capture the "remaining" universe of an incomplete set of markers, we
left out dependencies that should be included in that fork. We rectify
that here.

Fixes #5086

Partially addresses #4732
BurntSushi added a commit that referenced this issue Jul 26, 2024
When a fork occurs, we divide not just the dependencies that
provoked a fork into distinct groups, but we also add the
corresponding sibling dependencies to each fork. Previously,
while we track markers on the fork itself, the individual
dependencies that had markers only corresponded to markers
written from the dependency specification.

This meant that the sibling dependencies that got added to
each fork would not themselves have markers attached to them.
This in turn meant they would not have markers associated with
them in the lock file.

In many cases, this is actually okay, because the resolver will
pick a version that is "universal" across all forks in most
cases. But in some cases, this just simply isn't possible as
the marker expressions in the fork can and do influence resolution.
In which case, it is possible for the same package with different
versions to show up in the lock file unconditionally. Which is a
big no-no.

So in this commit, after we determine the forks, we intersect the
markers on each fork with each of its dependencies.

This does seem to balloon the marker expressions in some cases.
I plucked one low hanging fruit to avoid doing `x and x` in
trivial cases. (And this eliminated a portion of the snapshot
diffs.) But some pretty gnarly diffs remain.

This commit also fixes another bug: previously, when we created a fork
to capture the "remaining" universe of an incomplete set of markers, we
left out dependencies that should be included in that fork. We rectify
that here.

Fixes #5086

Partially addresses #4732
BurntSushi added a commit that referenced this issue Jul 26, 2024
Consider the following packse scenario:

```toml
[root]
requires = [
  "a>=1.0.0 ; python_version < '3.10'",
  "a>=1.1.0 ; python_version >= '3.10'",
  "a>=1.2.0 ; python_version >= '3.11'",
]

[packages.a.versions."1.0.0"]
[packages.a.versions."1.1.0"]
[packages.a.versions."1.2.0"]
```

On current `main`, this produces a dependency on `a` that looks like
this:

```toml
dependencies = [
    { name = "fork-overlapping-markers-basic-a", marker = "python_version < '3.10' or python_version >= '3.11'" },
]
```

But the marker expression is clearly wrong here, since it implies that
`a` isn't installed at all for Python 3.10. With this PR, the above
dependency becomes:

```toml
dependencies = [
    { name = "fork-overlapping-markers-basic-a" },
]
```

That is, it's unconditional. Which is I believe correct here since there
aren't any other constraints on which version to select.

The specific bug here is that when we found overlapping dependency
specifications for the same package *within* a pre-existing fork, we
intersected all of their marker expressions instead of unioning them.
That in turn resulted in incorrect marker expressions.

While this doesn't fix any known bug on the issue tracker (like #4640),
it does appear to fix a couple of our snapshot tests. And fixes a basic
test case I came up with while working on #4732.

For the packse scenario test: astral-sh/packse#206
BurntSushi added a commit that referenced this issue Jul 26, 2024
Consider the following packse scenario:

```toml
[root]
requires = [
  "a>=1.0.0 ; python_version < '3.10'",
  "a>=1.1.0 ; python_version >= '3.10'",
  "a>=1.2.0 ; python_version >= '3.11'",
]

[packages.a.versions."1.0.0"]
[packages.a.versions."1.1.0"]
[packages.a.versions."1.2.0"]
```

On current `main`, this produces a dependency on `a` that looks like
this:

```toml
dependencies = [
    { name = "fork-overlapping-markers-basic-a", marker = "python_version < '3.10' or python_version >= '3.11'" },
]
```

But the marker expression is clearly wrong here, since it implies that
`a` isn't installed at all for Python 3.10. With this PR, the above
dependency becomes:

```toml
dependencies = [
    { name = "fork-overlapping-markers-basic-a" },
]
```

That is, it's unconditional. Which is I believe correct here since there
aren't any other constraints on which version to select.

The specific bug here is that when we found overlapping dependency
specifications for the same package *within* a pre-existing fork, we
intersected all of their marker expressions instead of unioning them.
That in turn resulted in incorrect marker expressions.

While this doesn't fix any known bug on the issue tracker (like #4640),
it does appear to fix a couple of our snapshot tests. And fixes a basic
test case I came up with while working on #4732.

For the packse scenario test: astral-sh/packse#206
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preview Experimental behavior
Projects
No open projects
Status: Done
2 participants