feat: (opt-in): terminate handling of work when the request has already timed out #328

jrmfg · 2024-05-17T18:05:35Z

Overhead-free (or at least very cheap).

The “timeout” gunicorn config means drastically different things for
sync and non-sync workers:

Workers silent for more than this many seconds are killed and restarted.

Value is a positive number or 0. Setting it to 0 has the effect of
infinite timeouts by disabling timeouts for all workers entirely.

Generally, the default of thirty seconds should suffice. Only set this
noticeably higher if you’re sure of the repercussions for sync workers.
For the non sync workers it just means that the worker process is still
communicating and is not tied to the length of time required to handle a
single request.

So. For cases where threads = 1 (user set or our defaults), we’ll use
the sync worker and let the regular timeout functionality do its thing.

For cases where threads > 1, we’re using the gthread worker, and timeout
means something completely different and not really user-observable. So
we’ll leave the communication timeout (default gunicorn “timeout”) at 30
seconds, but create our own gthread-derived worker class to use instead,
which terminates request handling (with no mind to gunicorn’s “graceful
shutdown” config), to emulate GCF 1st gen.

The arbiter spawns these workers, so we have to maintain some sort of
global timeout state for us to read in our custom gthread worker.

In the future, we should consider letting the user adjust the graceful
shutdown seconds. But the default of 30 seems like it’s worked fine
historically, so it’s hard to argue for changing it. IIUC, this means
that on gen 2, there’s a small behavior difference for the sync workers
compared to gen 1, in that gen 2 sync worker workloads will get an extra
30 seconds of timeout to gracefully shut down. I don’t think monkeying
with this config and opting-in to sync workers is very common, though,
so let’s not worry about it here; everyone should be on the gthread path
outlined above.

…dy timed out Overhead-free (or at least very cheap). The “timeout” gunicorn config means drastically different things for sync and non-sync workers: Workers silent for more than this many seconds are killed and restarted. Value is a positive number or 0. Setting it to 0 has the effect of infinite timeouts by disabling timeouts for all workers entirely. Generally, the default of thirty seconds should suffice. Only set this noticeably higher if you’re sure of the repercussions for sync workers. For the non sync workers it just means that the worker process is still communicating and is not tied to the length of time required to handle a single request. So. For cases where threads = 1 (user set or our defaults), we’ll use the sync worker and let the regular timeout functionality do its thing. For cases where threads > 1, we’re using the gthread worker, and timeout means something completely different and not really user-observable. So we’ll leave the communication timeout (default gunicorn “timeout”) at 30 seconds, but create our own gthread-derived worker class to use instead, which terminates request handling (with no mind to gunicorn’s “graceful shutdown” config), to emulate GCFv1. The arbiter spawns these workers, so we have to maintain some sort of global timeout state for us to read in our custom gthread worker. In the future, we should consider letting the user adjust the graceful shutdown seconds. But the default of 30 seems like it’s worked fine historically, so it’s hard to argue for changing it. IIUC, this means that on gen 2, there’s a small behavior difference for the sync workers compared to gen 1, in that gen 2 sync worker workloads will get an extra 30 seconds of timeout to gracefully shut down. I don’t think monkeying with this config and opting-in to sync workers is very common, though, so let’s not worry about it here; everyone should be on the gthread path outlined above.

give up on coverage support for things that are tested in different processes, or in gthread, because it looks like pytest-cov gave up on support for these, where as coverage has out-of-the-box support

there's something test-specific about how mac pickles functions for execution in multiprocessing.Process which is causing problems. it seems somewhere in the innards of flask and gunicorn and macos... since this feature is opt-in anyway, let's just skip testing darwin.

causes flakes sometimes in workflows

value adding it for windows anyway

src/functions_framework/_http/gunicorn.py

tests/test_functions/timeout/main.py

tests/test_timeouts.py

these shouldn't have changed with this commit

HKWinterhalter · 2024-05-17T20:08:59Z

Looks great!

jrmfg added 7 commits May 16, 2024 07:51

feat: restore defaults present < 3.6.0, but retain customizability

ccb4d90

revert the test, too

dcb6eb1

also restore this assert :)

0242768

fix tests

11284f0

Merge branch 'main' into zombie-timeout

1821445

small test fixes

af1f2fb

give up on coverage support for things that are tested in different processes, or in gthread, because it looks like pytest-cov gave up on support for these, where as coverage has out-of-the-box support

jrmfg requested a review from nifflets May 17, 2024 18:05

blunderbuss-gcf bot assigned HKWinterhalter May 17, 2024

jrmfg added 7 commits May 17, 2024 11:07

format

1e18b82

isort everything

5d170a8

sort tuple of dicts in async tests before asserting

fbffb7d

causes flakes sometimes in workflows

use double-quotes

6f7d6c7

also skip tests on windows - this is all built for gunicorn, there's no

25c6e7d

value adding it for windows anyway

skip import on windows

7a9acbe

HKWinterhalter reviewed May 17, 2024

View reviewed changes

src/functions_framework/_http/gunicorn.py Outdated Show resolved Hide resolved

tests/test_functions/timeout/main.py Outdated Show resolved Hide resolved

tests/test_timeouts.py Outdated Show resolved Hide resolved

tests/test_timeouts.py Show resolved Hide resolved

jrmfg added 2 commits May 17, 2024 13:00

easy stuff

0a70fb9

add a few tests for sync worker timeouts

7e559e5

these shouldn't have changed with this commit

HKWinterhalter approved these changes May 17, 2024

View reviewed changes

jrmfg merged commit 2601975 into GoogleCloudPlatform:main May 17, 2024
46 checks passed

release-please bot mentioned this pull request May 17, 2024

chore(main): release 3.7.0 #324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: (opt-in): terminate handling of work when the request has already timed out #328

feat: (opt-in): terminate handling of work when the request has already timed out #328

jrmfg commented May 17, 2024 •

edited

Loading

HKWinterhalter commented May 17, 2024

feat: (opt-in): terminate handling of work when the request has already timed out #328

feat: (opt-in): terminate handling of work when the request has already timed out #328

Conversation

jrmfg commented May 17, 2024 • edited Loading

HKWinterhalter commented May 17, 2024

jrmfg commented May 17, 2024 •

edited

Loading