Skip to content

Commit

Permalink
chore: refactor a bunch of auth-related python (#8347)
Browse files Browse the repository at this point in the history
- Remove the Authentication object.

  Authentication was both a dataclass containing username and token, as
  well as a complex login function disguised as an __init__.

  Now the dataclass is called UsernameTokenPair and the complex login
  function is called `login()` or `login_with_cache()`.

- Make Session the core of API calls, not the api.request() function.

  The Session has a cache of authentication information and cert
  customization, and is easy to reuse.  The api.request() function
  encourages the use of singletons and creates really confusing
  configure-singleton-in-every-entrypoint behaviors.

- Introduce an UnauthSession which doesn't have an authentication token,
  for those few calls that we need to make which are not authenticated.

- Add a detproc helper module to the e2e tests, so even tests requiring
  the cli can be based on explicit sessions.

- Remove cli_auth and cli_cert singletons.

- Delete all singleton-based @authentication.required decorators from
  the cli; just create a session instead.

- Remove all api.{get,post,whatever} calls.

- Refactor the Profiler a bit to accept a Session object.  It used to
  depend on the cli_auth as well.

- Add some Session helpers to the e2e tests, with much better caching.

- Redesign the e2e auth to be Session-centric rather than cached-login
  centric, in particular:
    - redesign api_utils.create_test_user()
    - remove test_users.logged_in_user()
    - remove test_users.log_in_cli()
    - remove test_users.login_admin()
    - replace any call to Determined() with Determined._from_session()

- Introduce detproc module in e2e tests for session-centric CLI calls,
  replacing various adhoc helpers:
    - test_user.det_spawn()
    - test_user.det_run()
    - test_exp_continue.det_cmd()
    - test_groups.det_cmd()
    - test_groups.det_cmd_json()
    - test_groups.det_cmd_expect_error()

BREAKING CHANGE: This refactor removes the Session object from all
public API namespaces, and also adds the new requirement that all tasks
have a writable HOME directory.  Previously, a small subset of
functionality existed for tasks which had no writable HOME directory.
  • Loading branch information
rb-determined-ai authored Feb 21, 2024
1 parent 66b1e6c commit 76ec233
Show file tree
Hide file tree
Showing 171 changed files with 5,411 additions and 6,530 deletions.
2 changes: 2 additions & 0 deletions .circleci/real_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,7 @@ commands:
command: |
pytest -vv -s \
-m <<parameters.mark>> \
--no-compare-stats \
--det-version="<<parameters.det-version>>"
deploy-aws-cluster:
Expand Down Expand Up @@ -2919,6 +2920,7 @@ workflows:
devcluster-config: agent-no-connection.devcluster.yaml
target-stage: agent
wait-for-master: false
extra-pytest-flags: "--no-compare-stats"

- test-perf:
name: test-perf
Expand Down
5 changes: 3 additions & 2 deletions .circleci/scripts/wait_for_master.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@
import requests

from determined.common import api
from determined.common.api import certs
from determined.common.api import authentication, certs


def _wait_for_master(address: str) -> None:
print("Checking for master at", address)
cert = certs.Cert(noverify=True)
sess = api.UnauthSession(address, cert)
for _ in range(150):
try:
r = api.get(address, "info", authenticated=False, cert=cert)
r = sess.get("info")
if r.status_code == requests.codes.ok:
return
except api.errors.MasterNotFoundException:
Expand Down
2 changes: 1 addition & 1 deletion bindings/generate_bindings_py.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ def gen_function(func: swagger_parser.Function) -> Code:
out = [f"def {func.operation_name_sc()}("]

# Function parameters.
out += [' session: "api.Session",']
out += [' session: "api.BaseSession",']
if func.params:
out += [" *,"]

Expand Down
17 changes: 15 additions & 2 deletions docs/manage/users.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,19 @@ user and group on an agent can be configured in the ``master.yaml`` file located
group: root
gid: 0
.. note::

A writable ``HOME`` directory is required by all Determined tasks. By default, all official
Determined images contain a tool called ``libnss_determined`` that injects users into the
container at runtime. If you are building custom images using a base image other than those
provided by Determined, you may need to take one of the following steps:

- prebuild all users you might need into your custom image, or
- include ``libnss_determined`` in your custom image to ensure user injection works as
expected, or
- find an alternate solution that serves the same purpose of injecting users into the
container at runtime

.. _run-unprivileged-tasks:

***********************************
Expand All @@ -235,8 +248,8 @@ user and group on an agent can be configured in the ``master.yaml`` file located
Some administrators of Determined may wish to run tasks as unprivileged users by default. In Linux,
unprivileged processes are sometimes run under the `nobody
<https://en.wikipedia.org/wiki/Nobody_(username)>`_ user, which has very few privileges. However,
the ``nobody`` user does not have a writable ``HOME`` directory, which causes problems for some
common tools like ``gsutil``.
the ``nobody`` user does not have a writable ``HOME`` directory, which is a requirement for tasks in
Determined, so the ``nobody`` user will typically not work in Determined.

For convenience, the default Determined environments contain an unprivileged user named
``det-nobody``, which does have a writable ``HOME`` directory. The ``det-nobody`` user is a suitable
Expand Down
18 changes: 18 additions & 0 deletions docs/release-notes/8437-auth-refactor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
:orphan:

**Removed Features**

- **Breaking Change** Removed the accidentally-exposed Session object from the
``det.experimental.client`` namespace. It was never meant to be a public API and it was not
documented in :ref:`python-sdk`, but was nonetheless exposed in that namespace. It was also
available as a deprecated legacy alias, ``det.experimental.Session``. It is expected that most
users use the Python SDK normally and are unaffected by this change, since the
``det.experimental.client``'s ``login()`` and ``Determined()`` are unaffected.

- **Breaking Change** Add a new requirement for runtime configurations that there be a writable
``$HOME`` directory in every container. Previously, there was limited support for containers
without a writable ``$HOME``, merely by coincidence. This change could impact users in scenarios
where jobs were configured to run as the ``nobody`` user inside a container, instead of the
``det-nobody`` alternative recommended in :ref:`run-unprivileged-tasks`. Users combining non-root
tasks with custom images not based on Determined's official images may also be affected. Overall,
it is expected that few or no users are affected by this change.
53 changes: 0 additions & 53 deletions e2e_tests/.flake8
Original file line number Diff line number Diff line change
Expand Up @@ -7,64 +7,11 @@ max-line-length = 100
# be used by the importer.)
per-file-ignores =
__init__.py:F401,I2041
tests/api_utils.py:I2041
tests/cluster/abstract_cluster.py:I2041
tests/cluster_log_manager.py:I2041
tests/cluster/managed_cluster_k8s.py:I2041
tests/cluster/managed_cluster.py:I2041
tests/cluster/managed_slurm_cluster.py:I2041
tests/cluster/test_agent_disable.py:I2041
tests/cluster/test_agent.py:I2041
tests/cluster/test_agent_restart.py:I2041
tests/cluster/test_agent_user_group.py:I2041
tests/cluster/test_checkpoints.py:I2041
tests/cluster/test_groups.py:I2041
tests/cluster/test_job_queue.py:I2041
tests/cluster/test_master_restart.py:I2041
tests/cluster/test_master_restart_slurm.py:I2041
tests/cluster/test_model_registry.py:I2041
tests/cluster/test_model_registry_rbac.py:I2041
tests/cluster/test_oauth2_scim_client.py:I2041
tests/cluster/test_priority_scheduler.py:I2041
tests/cluster/test_proxy.py:I2041
tests/cluster/test_rbac_misc.py:I2041
tests/cluster/test_rbac_ntsc.py:I2041
tests/cluster/test_rbac.py:I2041
tests/cluster/test_resource_manager.py:I2041
tests/cluster/test_slurm.py:I2041
tests/cluster/test_users.py:I2041
tests/cluster/test_webhooks.py:I2041
tests/cluster/test_workspace_org.py:I2041
tests/cluster/utils.py:I2041
tests/command/command.py:I2041
tests/command/test_notebook.py:I2041
tests/command/test_run.py:I2041
tests/command/test_shell.py:I2041
tests/command/test_tensorboard.py:I2041
tests/config.py:I2041
tests/conftest.py:I2041
tests/deploy/test_local.py:I2041
tests/experiment/experiment.py:I2041
tests/experiment/record_profiling.py:I2041
tests/experiment/test_allocation_csv.py:I2041
tests/experiment/test_api.py:I2041
tests/experiment/test_core.py:I2041
tests/experiment/test_hpc_launch.py:I2041
tests/experiment/test_launch.py:I2041
tests/experiment/test_noop.py:I2041
tests/experiment/test_pending_hpc.py:I2041
tests/experiment/test_port_registry.py:I2041
tests/experiment/test_profiling.py:I2041
tests/filetree.py:I2041
tests/fixtures/mnist_pytorch/failable_model_def.py:I2041
tests/fixtures/mnist_pytorch/layers.py:I2041
tests/fixtures/mnist_pytorch/model_def.py:I2041
tests/fixtures/mnist_pytorch/stop_requested_model_def.py:I2041
tests/fixtures/trial_error/model_def.py:I2041
tests/job/test_rbac.py:I2041
tests/nightly/compute_stats.py:I2041
tests/template/test_template.py:I2041
tests/test_sdk.py:I2041

# Explanations for ignored error codes:
# - D1* (no missing docstrings): too much effort to start enforcing
Expand Down
Loading

0 comments on commit 76ec233

Please sign in to comment.