Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rate limits on writes to blob storage #55

Open
sunu opened this issue Sep 23, 2021 · 4 comments
Open

Implement rate limits on writes to blob storage #55

sunu opened this issue Sep 23, 2021 · 4 comments
Labels
feature-request New feature or request

Comments

@sunu
Copy link
Contributor

sunu commented Sep 23, 2021

While processing large PST files and other archives on Aleph. we hit the GCS rate limit some times while writing files to our storage bucket.

We should enforce a configurable rate limit when writing files to the archive.

@Rosencrantz
Copy link
Contributor

Are there other options here. For example, could we parallelise transactions with multiple accounts, or increase the GCS limit somehow? Not suggesting that these are feasible or better solutions, just want to understand what other options we might have?

@Rosencrantz Rosencrantz added the feature-request New feature or request label Oct 19, 2021
@sunu
Copy link
Contributor Author

sunu commented Oct 19, 2021

I read the docs and it seems GCS adjusts rate limits automatically based on usage https://cloud.google.com/storage/docs/request-rate#auto-scaling. I think we can get away with retries with exponential backoffs instead of a hard rate limit

@stchris
Copy link
Contributor

stchris commented Aug 4, 2023

The best way forward here is to let the google cloud Python sdk handle retries with an exponential backoff and jitter (see https://cloud.google.com/python/docs/reference/storage/latest/retry_timeout). We would replace our for loop (

for attempt in service_retries():
) and linear backoff (https://github.com/alephdata/servicelayer/blob/ceb20c34ce141796c46585247cb88607299f3d1c/servicelayer/archive/gs.py#L109C3-L109C3) with that.

See also https://occrp.sentry.io/issues/4162555422/?project=4504916166967297&query=is%3Aunresolved&referrer=issue-stream&stream_index=0

@tillprochaska
Copy link
Contributor

Helpful context from @brrttwrks: We had similar issues in the past when uploading archive/package files (ZIP archives, Outlook PST files, …). During ingestion, these files are unpacked and uploaded to the storage backend individually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
No open projects
Status: 🚀Feature Backlog
Development

No branches or pull requests

4 participants