-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile Cythonization work #4430
Comments
Do you have a link for this issue? |
I should note that here I'm using Coiled (sorry for the indirect advertisement). Doing this at scale requires going above the free tier limits. If anyone wants to help with this (or even play around) let me know and I'll add you to a team with significantly increased privileges. Of course, this should also work just fine on other systems like custom-maintained Kubernetes clusters. |
We've been optimizing the scheduler for larger workloads recently. We've found that profiling the scheduler is challenging to do well. A lot of the recent profiling has happened on NVIDIA systems, which may or may not be representative of typical hardware.
I recently ran a small experiment on AWS with the following code:
I found that
It would be good to try a few things here:
distributed
with the appropriate flag (cc @jakirkham do we have instructions for this somewhere?) and using that instead of a Coiled-built imageAfter that we need to consider what to do about TLS communication performance. Is this just because we're using TLS rather than TCP? If so, is there anything that we can do to accelerate this? Maybe asyncio is faster? Maybe Tornado can be improved?
NVIDIA devs @quasiben and @jakirkham report that UCX doesn't have this problem (although that may be because it's hard to profile).
The text was updated successfully, but these errors were encountered: