-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] adding batch server/client #702
base: main
Are you sure you want to change the base?
Conversation
This doesn't queue up the work for execution in the background, correct? It looks like the last item in the batch will call Also, this means that all callers would be blocked until all batch_size calls to the servers have been made, correct? Would the main use case of this be the same client calling the batch method on the server concurrently, and then doing a single wait for the final list of results? |
The main use case for this would be a bunch of threads/processes all
calling the batch method concurrently, and then waiting for a result. In
some cases, the clients might want to use an async version to queue up
multiple requests before waiting on them (e.g. if I have to evaluate 7
models for the 7 players, to take 1 game step).
Looking closely, there are a number of implementation details that look
weird to me:
- It looks like you are calling fn on each element of the batch rather
than once on the whole batch. But that's something we can fix.
- Unfortunately we determined that this "batched callback" approach is not
sufficient because e.g. if my batch is 100 and I have 40 clients sending
requests in parallel, I hit a deadlock. We solved this in TB by havin a
server-controlled queue that could request a batch with a given min/max
size and timeout, and/or ways to ask for the queue size. This gives a lot
of flexibility on the server side e.g. to decide which queue (i.e. which
RPC) to service in which order.
- There's a thread lock, does that mean that rpc will be calling the
callbacks from multiple threads? If so, I don't think the locking is
correct, although technically you don't need any lock since the GIL should
cover it.
…On Tue, Jan 28, 2020 at 7:39 PM Rohan Varma ***@***.***> wrote:
This doesn't queue up the work for execution in the background, correct?
It looks like the last item in the batch will call set_result which would
unblock the fut.result() calls that are waiting in the other items being
processed.
Also, this means that all callers would be blocked until all batch_size
calls to the servers have been made, correct? Would the main use case of
this be the same client calling the batch method on the server
concurrently, and then doing a single wait for the final list of results?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#702?email_source=notifications&email_token=ABLQEDKOQ54U4N7GLPC4FVTRADF3DA5CNFSM4KMWN5M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKFRNAI#issuecomment-579540609>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLQEDNE4O7IGICGNYB7WWLRADF3DANCNFSM4KMWN5MQ>
.
|
Right, I feel the batching behavior should be application specific, e.g., the application might want to create a tensor as buffer for multiple requests or it might need to run some custom aggregate fn before processing each of the requests. How did you make it generic in your current implementation?
This can be done by using
This should be possible too, similar to this example, where the server launches nested RPCs.
Yes, each RPC launched from the client will be handled by a thread on the server side. So, they can happen concurrently.
True, lock is not necessary in this example, but would be necessary if we add torchscript decorator. In terms of C++ implementation, it can be done at different levels Option 1: implement a custom Option 2: implement a custom rpc agent. This would also give you the access to the thread management on the server side. You probably don't want to worry about the messaging layer. If so, we can split the current agent into separate comm module and callback/thread management module. |
Yes, it is what is implemented in this example.
Right, this is not meant to be the optimal solution for this use case. Just trying to show:
If we want to make this efficient, the batching implementation will need to move to C++. |
Regardless of the drawbacks of this specific implementation, it does have the benefit of being relatively simple so it might be worth having it as a PyTorch example? :) |
|
||
def __getattr__(self, name): | ||
def fn(*args, **kwargs): | ||
return rpc_sync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be clearer if we change this to rpc_async? then a single client can eventually trigger a batch, and we could increase the batch size. I suspect this example was created before async was available?
No description provided.