Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reuse HTTP sessions #9116

Merged
merged 13 commits into from
Apr 30, 2024
Merged

feat: reuse HTTP sessions #9116

merged 13 commits into from
Apr 30, 2024

Conversation

azhou-determined
Copy link
Contributor

Description

Introduce the option of using persistent HTTP sessions in api.Session.

Previously, each API call was wrapped in its own requests.Session, which meant making a new underlying TCP connection/TLS handshake every time. Sometimes this is necessary because we can't know when to close the session, and we don't want to leave too many dangling HTTP connections open. But in many cases, we make lots of API calls in quick succession within known bounds, and in these cases using a persistent HTTP session significantly improves performance:

Reporting metrics in Core API/Detached mode:

def main():
    core_v2.init(
        defaults=core_v2.DefaultConfig(
            name="unmanaged-1-singleton",
        ),
    )

    for i in range(100):
        core_v2.train.report_training_metrics(
            steps_completed=i, metrics={"loss": random.random()}
        )

        if (i + 1) % 10 == 0:
            core_v2.train.report_validation_metrics(
                steps_completed=i, metrics={"loss": random.random()}
            )

    core_v2.close()

Not using persistent HTTP sessions: 13-14s
With a persistent HTTP session: 6-7s

CLI calls that fetch multiple pages:
i.e. list-experiments, list-checkpoints
Before: 0.75s
After: 0.3s

api.Session as a context manager

This PR exposes the option of re-using the same underlying HTTP session through api.Session as a context manager:

with api.Session(...) as sess:
  # All requests within the context share the same persistent HTTP connection.
  # Once the context exits, the session is closed.
  sess.post(...)
  sess.get(...)

When used directly, each HTTP call will create a new session every time. This is the current behavior, so existing code will continue to function the same way:

# When called directly outside of a context, each request uses a new HTTP session.
sess = api.Session(...)
sess.post(...)
sess.get(...)

This functionality is implemented in Core API, where all internal API calls within a core.Context (with core.init(...) as core_context:) will now share the same persistent HTTP connection which is closed when the core_context context exits.

CLI

In the CLI, api.Session as a context manager is implemented as a decorator, and when used, exposes a session that is persistent within the decorated function:

- def list_experiments(args: Namespace) -> None:
-     sess = cli.setup_session(args)

+ @cli.session
+ def list_experiments(args: Namespace, sess: api.Session) -> None:

I only changed a few select APIs/CLIs to use the new functionality, mostly as examples of how it would be used. There are many places that would see performance benefits from persistent sessions, but I imagine that will probably be a separate effort/PR.

Other api.Sessionrefactors

Some other small refactors made to api.Session:

  • Set auth/cert-related parameters on the session instead of passing them into each request
  • harness/determined/common/requests.py was deleted, its logic consolidated into harness/determined/common/api/_session.py

Test Plan

  • Run any experiment that reports a bunch of metrics (ideally Core API / detached mode) and enable debug logs.
  • Submit the experiment and the logs should show "urllib3.connectionpool: Starting new HTTP connection" only once at the start of training
  • Experiment should be noticeably faster

Commentary (optional)

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

@azhou-determined azhou-determined requested a review from a team as a code owner April 5, 2024 16:34
@cla-bot cla-bot bot added the cla-signed label Apr 5, 2024
Copy link

netlify bot commented Apr 5, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 3bda736
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/663124c264e9aa000803fea5

@@ -99,6 +100,22 @@ def setup_session(args: argparse.Namespace) -> api.Session:
)


def session(
fn: Callable[[argparse.Namespace, api.Session], None]
) -> Callable[[argparse.Namespace, api.Session], None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Callable[[argparse.Namespace, api.Session], None]:
) -> Callable[argparse.Namespace, None]:

cause the wrapped function does not take 2nd api.Session param anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if self._http_session:
self._http_session.close()
self._http_session = self._make_http_session()
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes the semantics a bit. I think the old method and with_ naming and its docstring indicated you make a new session like the old one but with this one setting different. you can continue using the old session object with the old setting.

now, the underlying object is updated in place with the new setting. it does not seem like this method is ever used in a way where the old object is still preserved. so the change in semantics is fine. but maybe

  1. update the docstring to reflect the different behavior
  2. consider renaming it to set_retry or something to reflect it updates the thing itself

Comment on lines 22 to 25
params=None,
headers=None,
timeout=None,
stream=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't all these parameters redundant?

@wes-turner wes-turner self-assigned this Apr 15, 2024
@@ -110,6 +111,26 @@ def read_paginated(
offset = pagination.endIndex


def read_paginated_with_session(
Copy link
Contributor

@wes-turner wes-turner Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: add an optional session to read_paginated? What does having separate functions get us?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... I don't get why this was necessary to do at all. get_with_offset has a session already referenced. Why would we need do give it a different one rather than the one that tends to be in scope when it's defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is partially because i initially implemented it in read_paginated but in a later commit decided to separate it out, and in doing so, left out the code that actually mattered 🙃 i fixed it, so that now read_paginated_with_session is responsible for the session context, and get_with_offset uses the session it's given.

Copy link
Contributor Author

@azhou-determined azhou-determined Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need do give it a different one rather than the one that tends to be in scope when it's defined?

because the scope is different. get_with_offset is a singular API call that can't manage a persistent session. imagine:

with self._session as sess:
    def get_with_offset(int):
        return bindings.get_Trials(
            session=sess
        )
    # if we call read_paginated here, it doesn't seem right that it doesn't take in a session
    resp = read_paginated(get_with_offset)

# if we call it here outside the session context, the session context is useless
resp = read_paginated(get_with_offset)

it makes sense to me that read_paginated has the responsibility of entering/exiting the session context since it knows the "context" of requesting multiple pages in succession. then get_with_offset just uses the session read_paginated_with_session gives it.

i agree it's not great, but given the way read_paginated is implemented with the separate get_with_offset, this is the cleanest way i could find to do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: add an optional session to read_paginated? What does having separate functions get us?

tried this at some point too. this was v1 lol. main issue is that callables can't have optional default arguments, so every get_with_offset would have to explicitly be written with in session=None in their signature. that seems like a very annoying change.

Copy link
Contributor

@wes-turner wes-turner Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit-session interface to read_paginated looks to me like:

  1. construct a method that maps an int to a list of things starting at that number
  2. exercise the method to get a complete list of things

The explicit-session interface is:

  1. construct a method that uses a session to map an int to a list of things starting at that number
  2. if the Session passed to read_paginated has an existing HTTP connection, close it
  3. create a new persistent HTTP connection in the session passed to read_paginated
  4. use the modified Session to call the master and get your complete list
  5. close the persistent HTTP connection in the passed Session

Modifying the Session seems bad to me. If it turns out we need an explicit-session interface, then I think it should either rely on persistent-HTTP management from the caller or create a brand-new session. Relying on the caller to manage it seems like the more natural interface from the point of view of a caller for the given signature.

Your intuition and mine differ about how

resp = read_paginated(get_with_offset)

looks without a passed session. I think read_paginated/get_with_offset interface is bad, but given the relationship between them, it feels more natural to me for the session to be implicit in get_with_offset. Whether that's closed with read_paginated or closed by the caller of read_paginated, I don't so much care.

I don't think it's necessary to support the use case in the example above. I don't think a caller should expect that that use case works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on this, I think there's an analog in trying to manage SDK resources in a world where Sessions may be closed at any point.

with authentication.login(<...>) as sess:
  exp = experiment.Experiment(id=1, session=sess)

exp.pause()

I don't think this is a use case we should try to support, either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed additional changes for read_paginated

pages=api.PageOpts.single if max_results else api.PageOpts.all,
)
with self._session as sess:
resps = api.read_paginated(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still one of my least-favorite parts of the SDK. Would a future design want to build this into BaseSession?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love it either, but i dunno. could also see read_paginated belonging somewhere in bindings. definitely out of scope for this PR

resps = api.read_paginated(
get_with_offset=get_with_offset,
pages=api.PageOpts.single if max_results else api.PageOpts.all,
session=sess,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently written with separate read_paginated and read_paginated_with_session, I don't understand how this works. session isn't an argument of read_paginated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't lol, i initially implemented read_paginated to take in session and just didn't change all the implementations.

raise errors.APIException(r)

return r
def _make_requests_session(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference should the reader make between a requests_session and an http_session?

Copy link
Contributor

@wes-turner wes-turner Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like a BaseSession._http_session is better named a _requests_session, FWIW. The word "http" reads to me like it's supposed to mean "not https".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok -- suggestion: replace all the _make_http_session methods with calls to _make_requests_session that pass explicit values for all these parameters. That way I don't have to cross-reference attributes and figure out what they may have been set to from elsewhere (with_retry took a lot of thinking before I felt comfortable that it was correct)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, now I see that isn't possible. This seems like a smell. Thinking about how to fix it.

Copy link
Contributor

@wes-turner wes-turner Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... this would be simpler if, like in requests, the resources were created during __init__. In requests.sessions.Session:

    def __enter__(self):
        return self

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any danger in doing it this way? I don't think a connection will be opened inside the requests Session until a GET, etc. is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, i tried this at some point. i think the awkwardness here goes back to wanting to support the existing behavior of keeping direct calls single-use sessions.

but it was my first instinct as well to initialize the session in the init. it ended up being more awkward, because then there's non-obvious logic that would be required in _do_request on when to close them.

in the end i felt like it was much more explicit to create resources as the behavior dictates: only on entering a context manager do we start a persistent session, and _do_request will use a persistent session if it exists, otherwise it will create a one-time session that only exits for the scope of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because then there's non-obvious logic that would be required in _do_request on when to close them.

Could we talk through this? I think it would be nice to know that resources are created rather than have to puzzle out context.

requests_session.mount(
"https://", _HTTPSAdapter(server_hostname=server_hostname, max_retries=max_retries)
)
requests_session.mount("http://", adapters.HTTPAdapter(max_retries=max_retries))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTPAdapter for https:// has a server_hostname and the one for http:// doesn't. Is that a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NM. I get it now. Where did this pattern come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to worry about custom TLS certs if it's not HTTPS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the pattern for setting a hostname inside the subclassed HTTPAdapter. I assume it's good practice, but when I went to look for examples out there on the internet to learn more about it I wasn't very successful. Where'd the pattern come from?

return r
def _make_requests_session(
server_hostname: Optional[str] = None,
verify: Optional[Union[str, bool]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When verify is optional, then I have to figure out what three states mean. I think it's slightly easier (and means the same thing) if it's just a Union[str, bool] instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably could default None to True, but it's not that obvious and i'm not sure that it's this method's responsibility to make that default decision.

verify comes directly from cert.bundle, which has Optional[Union[str, bool]], along with some not-super-trivial logic to determine which type it is:

if noverify:
            self._bundle = False  # type: Union[None, str, bool]
        elif cert_pem is None:
            self._bundle = None
        else:
            self._bundle = str

requests also takes in the same type and has some logic for handling None (ultimately it treats it as true AFAICT, but it's not immediately obvious).

Copy link
Contributor

@wes-turner wes-turner Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only now looking at the source for how requests.Sessions are initialized.

...

ok. So a requests.Session has a verify attribute that's set to True at init but which the docs say you can falsify later on. FWIW, I don't see mention of setting it to None.

requests.Session.request does take an Optional verify when constructing its request, and that logic is awful:

  1. if request.verify \in (True | None), and a bundle-string exists in an environment variable, then overwrite request.verify with that bundle-string
  2. if now request.verify is None, then use Session.verify for the request
  3. if request.verify is not None, use whatever request.verify now is for the request

ok. It's time to send the request. Now it's up to the Adapter to decide what to do with verify. Looking at the source, the comment on the base class says that verify should be either a string or a bool. This library is untyped so whatever, but says nothing about the possibility that it's a None.

I think it's a type error at this point for verify to be a None unless the adapter we subclass it with explicitly handles that case. Ours does not, but having earlier set request.verify to None manually does mean that possibility is there.

My conclusion from this is that for our use case we should not set a requests.Session.verify to None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaulting this to true now

@@ -39,6 +39,7 @@ class Context:

def __init__(
self,
_session: api.Session,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that I don't understand how this works. I can't tell how much of that is my fault, how much the PR's. When you're ready to talk about this part, could we chat?

Copy link

codecov bot commented Apr 19, 2024

Codecov Report

Attention: Patch coverage is 89.70588% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 44.61%. Comparing base (55b7fd9) to head (3bda736).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9116      +/-   ##
==========================================
+ Coverage   44.59%   44.61%   +0.01%     
==========================================
  Files        1273     1273              
  Lines      155832   155888      +56     
  Branches     2439     2439              
==========================================
+ Hits        69501    69555      +54     
- Misses      86092    86094       +2     
  Partials      239      239              
Flag Coverage Δ
backend 41.74% <ø> (-0.01%) ⬇️
harness 64.09% <89.70%> (+0.04%) ⬆️
web 35.04% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
harness/determined/cli/__init__.py 100.00% <ø> (ø)
harness/determined/cli/experiment.py 18.10% <100.00%> (+0.21%) ⬆️
harness/determined/common/__init__.py 71.42% <100.00%> (ø)
harness/tests/common/api/test_session.py 100.00% <100.00%> (ø)
harness/tests/common/test_tls.py 100.00% <100.00%> (ø)
...rness/determined/common/experimental/experiment.py 71.95% <0.00%> (ø)
harness/determined/cli/_util.py 69.09% <71.42%> (+0.34%) ⬆️
harness/determined/core/_context.py 57.71% <60.00%> (+0.07%) ⬆️
harness/determined/common/api/_session.py 85.48% <88.15%> (+3.66%) ⬆️

... and 4 files with indirect coverage changes



def test_session_non_persistent_http_sessions() -> None:
with api_server.run_api_server(address=("localhost", 8080)) as master_url:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think api_server.run_api_server was a design mistake, and that we should either mock python endpoints or HTTP calls with responses. I think this would be plenty easy to do here, and would be clearer to the reader, in addition.

I don't think you have to migrate, but I do think it'd be better for our test health to stop using the fake master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

believe me i really didn't want to use this. tried both mocking endpoints and responses. the underlying HTTP connection pools only exist for real connections

sortByMetric=sort_by if isinstance(sort_by, str) else None,
states=[bindings.checkpointv1State.COMPLETED],
)
with self._session as sess:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncomfortable modifying a session as a side effect when list_checkpoints is called. What if the session stored in self._session already had a stored _http_session?

Copy link
Contributor

@wes-turner wes-turner Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With list_checkpoints coded in this way, is there any way to use the same HTTP connection for two successive calls?

with api.login(<details>) as sess:
  persistent_det = Determined.from_session(sess)
  persistent_det.get_experiment(<id>).list_trials()  # Done with a single HTTPConnection (probably)
  persistent_det.get_experiment(<id>).list_checkpoints()  # closes the PoolManager in the session in persistent_det
  persistent_det.get_experiment(<id>).list_trials()  # persistent_det no longer persistent; new HTTPConnection per call to master```

@wes-turner wes-turner removed their assignment Apr 26, 2024
@azhou-determined azhou-determined merged commit ae91042 into main Apr 30, 2024
89 of 100 checks passed
@azhou-determined azhou-determined deleted the session-reuse branch April 30, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants