Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_config_directory returns read-only paths when using NNI from Singularity container #3924

Closed
Markus92 opened this issue Jul 9, 2021 · 3 comments
Assignees

Comments

@Markus92
Copy link
Contributor

Markus92 commented Jul 9, 2021

Describe the issue:
When running NNI from inside a container made with Singularity, NNI tries to write config files in /usr/local. As opposed to Docker, processes inside these containers do not run as root and the root filesystem is read-only. This means /usr/local is read-only and non-writable. If conda is used to install packages to this container when it was built, /usr/local/conda-meta exists and unwritable, regardless of how NNI is installed. Note that Singularity is a popular container tool in the academic/HPC environment.

The error occurs here:

if sys.prefix != sys.base_prefix or Path(sys.prefix, 'conda-meta').is_dir():

As in my containers, sys.prefix = sys.base_prefix = /usr/local, and /usr/local/conda-meta exists
nni/runtime/config.py tries to find a directory to write configuration files but does not check if it can actually write in those directories. It is also impossible to override this behavior without editing NNI source code from a user perspective.

Solution would be to check if the directory is actually writable, or allow user to set an environment variable like NNI_CONFIG_DIRECTORY=/some/writable/directory to override this behavior.

Environment:

  • NNI version: 2.3

  • Training service (local|remote|pai|aml|etc): local

  • Client OS: Ubuntu 20.04

  • Server OS (for remote mode only):

  • Python version: 3.8

  • PyTorch/TensorFlow version: N/A

  • Is conda/virtualenv/venv used?: conda

  • Is running in Docker?: Singularity is used.

  • nnictl stdout and stderr:

Singularity> nnictl create --port 8081 --config config.yml INFO: expand codeDir: . to [privatedir]/. Traceback (most recent call last): File "/usr/local/bin/nnictl", line 8, in <module> sys.exit(parse_args()) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/nnictl.py", line 278, in parse_args args.func(args) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/launcher.py", line 515, in create_experiment config_v2 = convert.to_v2(config_yml).json() File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/convert.py", line 20, in to_v2 v2 = ExperimentConfig(platform) File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/common.py", line 85, in __init__ kwargs['trainingservice'] = util.training_service_config_factory( File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/util.py", line 43, in training_service_config_factory custom_ts_config_path = nni.runtime.config.get_config_file('training_services.json') File "/usr/local/lib/python3.8/site-packages/nni/runtime/config.py", line 33, in get_config_file shutil.copyfile(default, config_file) File "/usr/local/lib/python3.8/shutil.py", line 261, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: OSError: [Errno 30] Read-only file system: '/usr/local/nni/training_services.json'

How to reproduce it?:
See above. Run NNI, installed in a conda installed as root, on a filesystem as an unpriviledged user.

@liuzhe-lz
Copy link
Contributor

liuzhe-lz commented Jul 13, 2021

We use conda-meta to detect whether the directory is a conda environment. If it is, it should always be writable.
So the problem here is, your conda is "merged" into /usr/local, instead of being installed to somewhere like /usr/local/anaconda. The template files of conda-meta is placed alongside python runtime, and to NNI it looks like a conda environment directory.
I have to say it's a really strange setup...
We will try some other ways to detect conda environment but cannot guarantee whether there is ever a better one. Because conda does not officially provide APIs to do that.

Overriding config path with environment variable is a nice suggestion. We will implement it sooner or later.

@Markus92
Copy link
Contributor Author

There's no guarantees by conda that the directory is writable! There's many scenarios in which it's not (this is just one of them). For example, if you install conda as root and install packages system-wide, then drop down to an unprivileged user to run them, you get this behavior. This can happen a lot in production environments where packages should be read-only. Conda is used then not to provide virtual environments but as an alternative for pip as a package manager.

What is the reason for detecting a conda environment in the first place? Conda does not provide APIs to do that, as package behavior should not change depending on it.

I'll send a pull request with desired behavior.

@liuzhe-lz
Copy link
Contributor

liuzhe-lz commented Jul 14, 2021

Because there might be multiple NNI instances in different environments, and they must not share config file. The config file is about installed packages, and each environment must install its own packages separately.
The root problem is, Python provides no standard way to use config files, and with conda it breaks operating system's FHS. On the other hand wheel does not support post-install hook either. So there must be trade-off.
Since conda's tutorial has covered how to manage multiple environments but never mentions sudo, we think the former use case has higher priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants