-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🛑 DNM Deserialization: zero-copy merge subframes when possible #5112
🛑 DNM Deserialization: zero-copy merge subframes when possible #5112
Conversation
This is a hack because in real use, I am pretty sure that `frames` will be either all memoryviews or all not, since they'll either be coming from a comm or from a bytestring. But in tests where we call `loads` directly, this may not be the case.
Yeah I think reworking serialization to be a bit more efficient is one of the things that Mads was looking into. This may involve moving to msgspec. Though should note that Mads is on PTO atm (so not trying to bug him 😉) |
Using msgspec would make things easier, especially since it'll give us a Since Mads is going to be refactoring all this anyway, I'd advocate for getting this in as is then, though it's a bit arcane. I have a feeling it'll help memory performance quite a lot. I think the main reason why #4967 helped with memory pressure so much is that transfers were creating these memory spikes, and it just cut down the number of transfers. Making transfers actually zero-copy (at least for NumPy arrays; not sure about pandas yet?) may make high-communication workloads run much more smoothly. |
Co-authored-by: crusaderky <crusaderky@gmail.com>
Hey @crusaderky any idea why these CI logs are truncated? I'm seeing most of the |
@gjoseph92 looks like the whole docker container fell over halfway through. Either that or some issue in the github wrapper around it. |
GitHub actions was down for a bit yesterday. @gjoseph92 could you push an empty commit to rerun CI? |
Maybe it's a segmentation fault that is triggered by repr() which in turn is triggered by printing the stack trace? I would try to deliberately cause one to see if it spells out SIGSEGV explicitly or if you get an equally nebulous output. |
Gabe how confident are you that this PR won't have unintended
consequences? My gut reaction would be to wait on this one just because
it's so core and doesn't solve a current critical issue.
…On Fri, Jul 30, 2021 at 7:58 AM crusaderky ***@***.***> wrote:
Maybe it's a segmentation fault that is triggered by repr() which in turn
is triggered by the printing on the stack trace? I would try to
deliberately cause one to see if it spells out SIGSEGV explicitly or if you
get an equally nebulous output.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5112 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTEGERCNG5TH2UW2MYTT2K4Z7ANCNFSM5A4VTQCQ>
.
|
If the question is when should we merge this, would it make sense to do it after cutting the release? Should add Mads is getting back next week. So he may have thoughts on how to build on this in cutting down on memory usage. |
Yes certainly don't merge this until after the release. Definitely seems like there's a segfault here, which annoyingly I haven't reproduced locally yet on macOS. |
Less useful, but also less complex. Short of writing the frames by hand, I couldn't come up with a test that could even trigger this behavior. I'd feel uncomfortable having that code be untested and only run on what's already an edge case. So if this assert is ever triggered, we can come back and add the zero-copy and test it.
This reverts commit 39ce2b6.
A separate `copy_frames` function makes this more readable and easier to test. Also, I came up with a test for this case that's still contrived, but not ridiculously contrived. That said, we don't want this copy to happen. And I'm pretty confident it will never happen with reall comms, because either the whole message is one buffer (TCP), or memoryviews aren't used at all. This mix-and-match only even happens in tests; see 1869b18. So maybe we should stick with the assert as a warning to future developers, so nobody messes this up and it keeps working with a silent performance regression?
Can one of the admins verify this patch? |
As noted in ab6119a: I think this error is currently impossible to raise in real use, and we want to keep it that way. We'd like a future test to fail if something causes this case to happen, rather than a silent copy.
This should get another review. The failing tests were because the whole buffer underlying the frames may have a short prefix from This made me think of a strange (I think impossible actually) edge case: when
Nonetheless I didn't like leaving this case around since it could fail in very weird ways, so I added logic to copy these subframes that don't match up just in case: ab6119a. But I've changed my mind again and decided to just make this case an AssertionError, because:
I'll be out for the next week, so if consensus is to do the copy instead of the assert, just revert e2da610. This has become more complicated than I wanted given that we're hopefully redoing all this soon. I really wish there was some way to get the current index into the underlying buffer that a memoryview was pointing to. That would eliminate all this passing down offsets and be more reliable. Is there any way I'm missing to do this without writing a C extension to do pointer arithmetic on |
UCX uses a mix of NumPy arrays and RMM As to the pointer arithmetic point, yes one can use NumPy. In [1]: import numpy as np
In [2]: m = memoryview(bytearray(8))
In [3]: a = np.asarray(m)
In [4]: a.__array_interface__["data"][0]
Out[4]: 140691895000496 That said, NumPy has other utility functions like |
add to allowlist |
@jakirkham thanks for the pointer 😆 about NumPy! That is exactly the sort of thing I was looking for. Using that I was able to come up with a much more sensible implementation of this in a new PR. |
When merging sub-frames before deserializing, if all the sub-frames are memoryviews backed by the same underlying buffer, "merge" them without copying by making a new memoryview pointing to the same buffer, instead of always copying.
Running @crusaderky's reproducer from #5107 on
main
:With this PR:
Thinking more about my comment #5107 (comment) (tldr: is copying unavoidable because we have to convert non-contiguous memory into contiguous memory?), I realized that in many cases, all the frames already are in contiguous memory thanks to #4506, and they're also zero-copy memoryviews thanks to #3980 (go @jakirkham!). So merging frames in that case could become a bookkeeping operation, where we just make a view of the buffer spanning all the sub-frames we want to merge.
This appears to work but also feels hacky/brittle to me. It requires calculating and passing down offsets into the underlying buffer, and kinda makes some assumptions about how the chain of calling functions works. I might prefer:
unpack_frames
also handles merging frames, since we already have the original buffer and all of the offsets we need there. However, that would require deserializing headers withinunpack_frames
, which is probably just as much of a mess.memoryview
that tracks its index when sliced, and logic to merge those. I think we'd have to write this in C/Cython though, otherwise it wouldn't expose the buffer protocol.memoryview
object is pointing to within its buffer. But I believe there's no Python API for this.cc @madsbk
black distributed
/flake8 distributed
/isort distributed