Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] filelist's findall() should prefer original over links #3327

Open
vidartf opened this issue May 18, 2022 · 1 comment
Open

[BUG] filelist's findall() should prefer original over links #3327

vidartf opened this issue May 18, 2022 · 1 comment
Labels
bug Needs Triage Issues that need to be evaluated for severity and status.

Comments

@vidartf
Copy link

vidartf commented May 18, 2022

setuptools version

62.1.0

Python version

3.8.13

OS

Windows 10

Additional environment information

No response

Description

When trying to package an sdist of a library I maintain, certain folders were no longer being included when using a recent version of setuptools. Stepping through the code with PDB, I realized that the UniqueFolder filter was excluding them (introduced in #2714). For context, this is the kind of folder setup in use:

folderA/links/[symlink to folderB]
folderA/links/[symlink to folderC]
folderB/links/[symlink to folderC]
folderC

The problem is that instead of including folderB and folderC, only the links in folderA/links is included.

For me, these links gets created as part of some Javascript assets that I build an distribute with my package (https://github.com/vidartf/ipydatawidgets). The workarounds I can see:

  • Rename the folders so that os.walk traverses them in the "right" order. This is brittle as the docs for os.walk says "Whether or not the lists are sorted depends on the file system." This is another argument for why the current implementation is also problematic: The paths of the files that get included in your build are dependent on the OS implementation details!
  • Add a separate build step that unlinks these folders before building, and then re-links them afterwards.

Expected behavior

The expectation would be that the original folders (folderA, folderB and folderC) are all included, and if anything the symlinks pointing to them are excluded.

How to Reproduce

import tempfile
import os
import pathlib
from setuptools._distutils.filelist import findall

with tempfile.TemporaryDirectory() as root:
    d = pathlib.Path(root)
    os.mkdir(d / "A")
    os.mkdir(d / "B")
    os.mkdir(d / "C")
    (d / "A" / "fileinA.txt").write_text("foo")
    (d / "B" / "fileinB.txt").write_text("foo")
    (d / "C" / "fileinC.txt").write_text("foo")
    # Note: This will work if the sort order is changed!
    os.symlink(d / "C", d / "B" / "linkC", target_is_directory=True)
    os.symlink(d / "C", d / "A" / "linkC", target_is_directory=True)
    os.symlink(d / "B", d / "A" / "linkB", target_is_directory=True)

    files = findall(d)
    print(files)

Output

[
'<tmp root>\\tmpjswv9fx2\\A\\fileinA.txt',
'<tmp root>\\tmpjswv9fx2\\A\\linkB\\fileinB.txt',
'<tmp root>\\tmpjswv9fx2\\A\\linkB\\linkC\\fileinC.txt'
]
@vidartf vidartf added bug Needs Triage Issues that need to be evaluated for severity and status. labels May 18, 2022
@vidartf
Copy link
Author

vidartf commented May 18, 2022

Possible solutions that I see:

  1. Do not follow symlinks in the walk.
  2. Only prevent infinite recursions, instead of enforcing folder uniqueness.
  3. Only follow symlinks if the target they point to would not also be included in the walk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

No branches or pull requests

1 participant