-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid symlink infinite loops #44
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably have a easier time getting in with tests 😉
dirs = set() | ||
files = set() | ||
for dirpath, dirnames, filenames in os.walk(path, followlinks=True): | ||
st = os.stat(dirpath) | ||
scandirs = [] | ||
for dirname in dirnames: | ||
st = os.stat(os.path.join(dirpath, dirname)) | ||
dirkey = st.st_dev, st.st_ino | ||
if dirkey not in dirs: | ||
dirs.add(dirkey) | ||
scandirs.append(dirname) | ||
dirnames[:] = scandirs | ||
for f in filenames: | ||
files.add(os.path.join(dirpath, f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it's a common pattern, I'm very much a detractor to the init/loop/append
pattern (mostly because of how it mutates a local variable, making state analysis much more complicated).
The proposed approach entwines all of the logic of path traversal, link resolution, and deduplication into several loops.
Additionally, this change removes the check for os.path.isfile
. I'm not sure that check is important, but it seems to be important enough that it was included previously.
I'd much rather see a solution that separates the concern of walking a tree while avoiding loops and assembling a list of files in that walk. Since distutils is only needed for Python 3.6+, perhaps a pathlib-based solution could help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got an alternative solution I'll present.
I'm unsure if the path walker should suppress duplicates due to loops or if it should fail in that situation. Why is it better to ignore the loops rather than report an error that there exists a circular loop? I wonder too why distutils is special here. How is it that |
Circular links are perfectly fine and there are cases where they cannot be avoided. Their existence is not an "error", the code should be resilient and avoid going into endless loops. This can be avoided by normalising (resolving) the paths. |
In 88f5c72, I took a stab at creating a test to repro the error and to serve as a regression test for any fix. But interestingly, I couldn't get os.walk to go into infinite recursion. As I dug into it, I found that
So at least on macOS, the reported infinite recursion never occurs, and as long as the symlink isn't pointing to some location that's expensive to traverse (like '/'), everything behaves nicely. Similarly on Ubuntu, I ran the same test and it completes quickly. So it's going to prove difficult to have a reliable repro for this issue, as the reported issue is more complex than simple infinite recursion. |
…on 57.1.0 Alexei Colin (1): distutils: pass -rpath to linker on macOS >=10.5 Brian Rutledge (1): Use shutil for rmtree Jason R. Coombs (24): Test on Python 3.10 Remove src_root from setup.py, seemingly unused. Remove consideration for Java as Jython is no longer supported. Unpin dependencies for certs and ssl extras and remove dependency links. Instead, rely on pip for security. Remove setup_requires, obviated by build-requires in pyproject.toml. Suppress deprecation warnings in flake8 and packaging.tags. Ref pypa/packaging#433. 👹 Feed the hobgoblins (delint). Remove automerge. Add test capturing failure. Ref pypa/distutils#44. Ensure that the result contains only the one file, not all the different symlink variants to the same file. Wrap walk result to prevent infinite recursion. Fixes bpo-44497. Extract UniqueDirs for checking uniqueness. Rely on stat (inode and device) to deduplicate. Move _unique_dirs into classmethod on _UniqueDirs. Remove ssl_support. Fixes #2715. Update changelog. Restore the iterator Add test Update changelog. Add another deprecation warning bypass for flake8 on PyPy because the call depth is missed. Remove workaround for pypy/pypy#3503, now that the behavior is consistent with a workaround in importlib_metadata 4.6.1 (python/importlib_metadata#327. Remove wincertstore and certifi. Ref #2711 and #2716. Update changelog. Bump version: 57.0.0 → 57.1.0 Long Nguyen (2): Add clang mingw support Change get_gcc_versions back to get_versions Nicolas CANIART (1): Entrypoints in declarative config are now supported Richard Purdie (1): setuptools/dist: Fix reproducibility issue by sorting globbing clint-lawrence (1): Fix a broken external link da-woods (1): Fixed get_export_symbols for unicode filenames messense (1): Prefer using `Distribution.has_ext_modules` method
Related: pypa/setuptools#332
Related: https://bugs.python.org/issue44497