Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: (opt-in): terminate handling of work when the request has already timed out #328

Merged
merged 16 commits into from
May 17, 2024

Conversation

jrmfg
Copy link
Contributor

@jrmfg jrmfg commented May 17, 2024

Overhead-free (or at least very cheap).

The “timeout” gunicorn config means drastically different things for
sync and non-sync workers:

Workers silent for more than this many seconds are killed and restarted.

Value is a positive number or 0. Setting it to 0 has the effect of
infinite timeouts by disabling timeouts for all workers entirely.

Generally, the default of thirty seconds should suffice. Only set this
noticeably higher if you’re sure of the repercussions for sync workers.
For the non sync workers it just means that the worker process is still
communicating and is not tied to the length of time required to handle a
single request.

So. For cases where threads = 1 (user set or our defaults), we’ll use
the sync worker and let the regular timeout functionality do its thing.

For cases where threads > 1, we’re using the gthread worker, and timeout
means something completely different and not really user-observable. So
we’ll leave the communication timeout (default gunicorn “timeout”) at 30
seconds, but create our own gthread-derived worker class to use instead,
which terminates request handling (with no mind to gunicorn’s “graceful
shutdown” config), to emulate GCF 1st gen.

The arbiter spawns these workers, so we have to maintain some sort of
global timeout state for us to read in our custom gthread worker.

In the future, we should consider letting the user adjust the graceful
shutdown seconds. But the default of 30 seems like it’s worked fine
historically, so it’s hard to argue for changing it. IIUC, this means
that on gen 2, there’s a small behavior difference for the sync workers
compared to gen 1, in that gen 2 sync worker workloads will get an extra
30 seconds of timeout to gracefully shut down. I don’t think monkeying
with this config and opting-in to sync workers is very common, though,
so let’s not worry about it here; everyone should be on the gthread path
outlined above.

jrmfg added 7 commits May 16, 2024 07:51
…dy timed out

Overhead-free (or at least very cheap).

The “timeout” gunicorn config means drastically different things for
sync and non-sync workers:

Workers silent for more than this many seconds are killed and restarted.

Value is a positive number or 0. Setting it to 0 has the effect of
infinite timeouts by disabling timeouts for all workers entirely.

Generally, the default of thirty seconds should suffice. Only set this
noticeably higher if you’re sure of the repercussions for sync workers.
For the non sync workers it just means that the worker process is still
communicating and is not tied to the length of time required to handle a
single request.

So.  For cases where threads = 1 (user set or our defaults), we’ll use
the sync worker and let the regular timeout functionality do its thing.

For cases where threads > 1, we’re using the gthread worker, and timeout
means something completely different and not really user-observable.  So
we’ll leave the communication timeout (default gunicorn “timeout”) at 30
seconds, but create our own gthread-derived worker class to use instead,
which terminates request handling (with no mind to gunicorn’s “graceful
shutdown” config), to emulate GCFv1.

The arbiter spawns these workers, so we have to maintain some sort of
global timeout state for us to read in our custom gthread worker.

In the future, we should consider letting the user adjust the graceful
shutdown seconds.  But the default of 30 seems like it’s worked fine
historically, so it’s hard to argue for changing it.  IIUC, this means
that on gen 2, there’s a small behavior difference for the sync workers
compared to gen 1, in that gen 2 sync worker workloads will get an extra
30 seconds of timeout to gracefully shut down.  I don’t think monkeying
with this config and opting-in to sync workers is very common, though,
so let’s not worry about it here; everyone should be on the gthread path
outlined above.
give up on coverage support for things that are tested in different
processes, or in gthread, because it looks like pytest-cov gave up on
support for these, where as coverage has out-of-the-box support
jrmfg added 7 commits May 17, 2024 11:07
there's something test-specific about how mac pickles functions for
execution in multiprocessing.Process which is causing problems.  it
seems somewhere in the innards of flask and gunicorn and macos...

since this feature is opt-in anyway, let's just skip testing darwin.
causes flakes sometimes in workflows
src/functions_framework/_http/gunicorn.py Outdated Show resolved Hide resolved
tests/test_functions/timeout/main.py Outdated Show resolved Hide resolved
tests/test_timeouts.py Outdated Show resolved Hide resolved
tests/test_timeouts.py Show resolved Hide resolved
jrmfg added 2 commits May 17, 2024 13:00
these shouldn't have changed with this commit
@HKWinterhalter
Copy link
Contributor

Looks great!

@jrmfg jrmfg merged commit 2601975 into GoogleCloudPlatform:main May 17, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants