-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add in-memory singleflight for gcs #1876
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the VCS fetching does not need to be single-flighted because if the storage is protected, then only one version can ever get persisted.
This is to protect the VCS server from high load when a spike of requests comes in for the same module right after it's published. |
for some more context, the scenario we have is that we have a very large central repository (1GB+ raw size) with multiple modules in it. we use athens proxy with GCS backend for our goproxy implementation. When we make a change to the large repository, we have CI jobs that run and trigger downstream builds that all effectively need to pull the new version of the central repository at once (since this is based on automated CI, we actually hit this scenario on a regular basis). We currently run into two issues: 1. our git system experiences high load from all of these fetches at once, and 2. the memory usage of our athens instances expands to the point we experience stability issues. The current implementation of the GCS based singleflight is only handling the fact that when a module is stored in GCS, a conflict is effectively ignored since it doesn't matter as long as the module versions are immutable. However, it does not seem to handle singleflighting the fetches to VCS. Additionally, it seems like we cannot configure our athens instances to use GCS storage and memory singleflight because the GCS race can still happen across multiple instances of athens. This change wraps the GCS singleflight with the memory implementation so that the VCS fetches are also singleflighted per instance of athens. However the GCS singleflight implementation is still required to handle the multiple instances of athens that may upload to GCS simultaneously. This change should help with memory issues while also reducing the parallelism of VCS fetches to match the number of instances of athens that we run. |
As far as I understand from Athens's perspective, the existing Single Flight mechanism is only meant to protect the storage component (e.g. S3) as you've already mentioned, not the VCS host. I'd imagine this PR would have also to cover the other non-GCS portion as well 🤔 But i'm not exactly sure about this change without more concrete tests, details and plans On a separate note where I work, we have similar problems with the high VCS host load and high CPU/MEM usage on the Athens cluster before this. Our Go monorepo was 60+GB in raw size. We started seeing these issues when our users + SAST scanning jobs send A LOT of We were able to alleviate all these by using Athens If you need a reference, we've written an article about this here. I hope this helps a bit |
I won't dispute that there are certainly other ways to solve the problem and I agree that this change doesn't capture all the locations where this problem exists. I also agree that the argument about VCS load probably shouldn't be the primary driver for changes here. However, the main reason I think this makes sense is that by default, athens will actually run with an in-memory singleflight mode. By enabling GCS singleflight, the performance of athens immediately degrades due to the increase in VCS fetches. So it seems like a trivial fix to wrap the GCS singleflight with the default in-memory implementation so that enabling GCS as a singleflight mechanism only improves instead of degrades the memory performance from the defaults. |
if you want to maintain separation of concerns, I guess this is another way to solve the problem (however I haven't tested it yet): #1877 If the singleflight mechanism is only to ensure that the storage can handle simultaneous writes, then maybe it should be renamed. Its purpose seems to be conflict resolution and doesn't actually provide any guarantees that the remote operations are singleflighted. |
What is the problem I am trying to address?
Currently, the GCS singleflight implementation only handles the storage component. The fetching from VCS part doesn't seem to be singleflighted.
How is the fix applied?
Wrap the GCS singleflight with the in-memory singleflight to cut down on the simultaneous VCS fetches.
What GitHub issue(s) does this PR fix or close?