[WIP] adding batch server/client #702

mrshenli · 2020-01-28T17:35:45Z

No description provided.

mrshenli · 2020-01-28T17:36:49Z

rohan-varma · 2020-01-29T00:39:13Z

This doesn't queue up the work for execution in the background, correct? It looks like the last item in the batch will call set_result which would unblock the fut.result() calls that are waiting in the other items being processed.

Also, this means that all callers would be blocked until all batch_size calls to the servers have been made, correct? Would the main use case of this be the same client calling the batch method on the server concurrently, and then doing a single wait for the final list of results?

adamlerer · 2020-01-29T15:45:28Z

The main use case for this would be a bunch of threads/processes all calling the batch method concurrently, and then waiting for a result. In some cases, the clients might want to use an async version to queue up multiple requests before waiting on them (e.g. if I have to evaluate 7 models for the 7 players, to take 1 game step). Looking closely, there are a number of implementation details that look weird to me: - It looks like you are calling fn on each element of the batch rather than once on the whole batch. But that's something we can fix. - Unfortunately we determined that this "batched callback" approach is not sufficient because e.g. if my batch is 100 and I have 40 clients sending requests in parallel, I hit a deadlock. We solved this in TB by havin a server-controlled queue that could request a batch with a given min/max size and timeout, and/or ways to ask for the queue size. This gives a lot of flexibility on the server side e.g. to decide which queue (i.e. which RPC) to service in which order. - There's a thread lock, does that mean that rpc will be calling the callbacks from multiple threads? If so, I don't think the locking is correct, although technically you don't need any lock since the GIL should cover it.

…

On Tue, Jan 28, 2020 at 7:39 PM Rohan Varma ***@***.***> wrote: This doesn't queue up the work for execution in the background, correct? It looks like the last item in the batch will call set_result which would unblock the fut.result() calls that are waiting in the other items being processed. Also, this means that all callers would be blocked until all batch_size calls to the servers have been made, correct? Would the main use case of this be the same client calling the batch method on the server concurrently, and then doing a single wait for the final list of results? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#702?email_source=notifications&email_token=ABLQEDKOQ54U4N7GLPC4FVTRADF3DA5CNFSM4KMWN5M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKFRNAI#issuecomment-579540609>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLQEDNE4O7IGICGNYB7WWLRADF3DANCNFSM4KMWN5MQ> .

mrshenli · 2020-01-30T15:50:48Z

It looks like you are calling fn on each element of the batch rather
than once on the whole batch. But that's something we can fix.

Right, I feel the batching behavior should be application specific, e.g., the application might want to create a tensor as buffer for multiple requests or it might need to run some custom aggregate fn before processing each of the requests. How did you make it generic in your current implementation?

Unfortunately we determined that this "batched callback" approach is not
sufficient because e.g. if my batch is 100 and I have 40 clients sending
requests in parallel, I hit a deadlock.

This can be done by using rpc_async and rate limit it on the client side I guess?

We solved this in TB by havin a
server-controlled queue that could request a batch with a given min/max
size and timeout, and/or ways to ask for the queue size. This gives a lot
of flexibility on the server side e.g. to decide which queue (i.e. which
RPC) to service in which order.

This should be possible too, similar to this example, where the server launches nested RPCs.

There's a thread lock, does that mean that rpc will be calling the
callbacks from multiple threads?

Yes, each RPC launched from the client will be handled by a thread on the server side. So, they can happen concurrently.

If so, I don't think the locking is
correct, although technically you don't need any lock since the GIL should
cover it.

True, lock is not necessary in this example, but would be necessary if we add torchscript decorator.

In terms of C++ implementation, it can be done at different levels

Option 1: implement a custom RequestCallbackImpl to handle request batching and plug it in to RPC agent. The existing ComputationQueue can be added to it as well I think. This only requires minimum change and can keep that critical path for other applications intact (because not every application requires the batching feature).

Option 2: implement a custom rpc agent. This would also give you the access to the thread management on the server side. You probably don't want to worry about the messaging layer. If so, we can split the current agent into separate comm module and callback/thread management module.

mrshenli · 2020-01-30T15:54:38Z

This doesn't queue up the work for execution in the background, correct? It looks like the last item in the batch will call set_result which would unblock the fut.result() calls that are waiting in the other items being processed.

Yes, it is what is implemented in this example.

Also, this means that all callers would be blocked until all batch_size calls to the servers have been made, correct? Would the main use case of this be the same client calling the batch method on the server concurrently, and then doing a single wait for the final list of results?

Right, this is not meant to be the optimal solution for this use case. Just trying to show:

how to expose conventional RPC APIs, where server binds functions and client call them.
how batching can be implemented.

If we want to make this efficient, the batching implementation will need to move to C++.

heiner · 2020-02-24T16:05:49Z

Regardless of the drawbacks of this specific implementation, it does have the benefit of being relatively simple so it might be worth having it as a PyTorch example? :)

nairbv · 2020-11-13T18:18:21Z

distributed/rpc/batch/batch.py

+
+    def __getattr__(self, name):
+        def fn(*args, **kwargs):
+            return rpc_sync(


would this be clearer if we change this to rpc_async? then a single client can eventually trigger a batch, and we could increase the batch size. I suspect this example was created before async was available?

[WIP] adding batch server/client

bd4383a

facebook-github-bot added the cla signed label Oct 30, 2020

nairbv reviewed Nov 13, 2020

View reviewed changes

msaroufim added need review distributed and removed need review labels Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] adding batch server/client #702

[WIP] adding batch server/client #702

mrshenli commented Jan 28, 2020

mrshenli commented Jan 28, 2020

rohan-varma commented Jan 29, 2020

adamlerer commented Jan 29, 2020 via email

mrshenli commented Jan 30, 2020

mrshenli commented Jan 30, 2020

heiner commented Feb 24, 2020

nairbv Nov 13, 2020

[WIP] adding batch server/client #702

Are you sure you want to change the base?

[WIP] adding batch server/client #702

Conversation

mrshenli commented Jan 28, 2020

mrshenli commented Jan 28, 2020

rohan-varma commented Jan 29, 2020

adamlerer commented Jan 29, 2020 via email

mrshenli commented Jan 30, 2020

mrshenli commented Jan 30, 2020

heiner commented Feb 24, 2020

nairbv Nov 13, 2020

Choose a reason for hiding this comment