Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] adding batch server/client #702

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

mrshenli
Copy link
Contributor

No description provided.

@mrshenli
Copy link
Contributor Author

cc @adamlerer @heiner

@rohan-varma
Copy link
Member

This doesn't queue up the work for execution in the background, correct? It looks like the last item in the batch will call set_result which would unblock the fut.result() calls that are waiting in the other items being processed.

Also, this means that all callers would be blocked until all batch_size calls to the servers have been made, correct? Would the main use case of this be the same client calling the batch method on the server concurrently, and then doing a single wait for the final list of results?

@adamlerer
Copy link
Contributor

adamlerer commented Jan 29, 2020 via email

@mrshenli
Copy link
Contributor Author

  • It looks like you are calling fn on each element of the batch rather
    than once on the whole batch. But that's something we can fix.

Right, I feel the batching behavior should be application specific, e.g., the application might want to create a tensor as buffer for multiple requests or it might need to run some custom aggregate fn before processing each of the requests. How did you make it generic in your current implementation?

  • Unfortunately we determined that this "batched callback" approach is not
    sufficient because e.g. if my batch is 100 and I have 40 clients sending
    requests in parallel, I hit a deadlock.

This can be done by using rpc_async and rate limit it on the client side I guess?

We solved this in TB by havin a
server-controlled queue that could request a batch with a given min/max
size and timeout, and/or ways to ask for the queue size. This gives a lot
of flexibility on the server side e.g. to decide which queue (i.e. which
RPC) to service in which order.

This should be possible too, similar to this example, where the server launches nested RPCs.

  • There's a thread lock, does that mean that rpc will be calling the
    callbacks from multiple threads?

Yes, each RPC launched from the client will be handled by a thread on the server side. So, they can happen concurrently.

If so, I don't think the locking is
correct, although technically you don't need any lock since the GIL should
cover it.

True, lock is not necessary in this example, but would be necessary if we add torchscript decorator.

In terms of C++ implementation, it can be done at different levels

Option 1: implement a custom RequestCallbackImpl to handle request batching and plug it in to RPC agent. The existing ComputationQueue can be added to it as well I think. This only requires minimum change and can keep that critical path for other applications intact (because not every application requires the batching feature).

Option 2: implement a custom rpc agent. This would also give you the access to the thread management on the server side. You probably don't want to worry about the messaging layer. If so, we can split the current agent into separate comm module and callback/thread management module.

@mrshenli
Copy link
Contributor Author

This doesn't queue up the work for execution in the background, correct? It looks like the last item in the batch will call set_result which would unblock the fut.result() calls that are waiting in the other items being processed.

Yes, it is what is implemented in this example.

Also, this means that all callers would be blocked until all batch_size calls to the servers have been made, correct? Would the main use case of this be the same client calling the batch method on the server concurrently, and then doing a single wait for the final list of results?

Right, this is not meant to be the optimal solution for this use case. Just trying to show:

  1. how to expose conventional RPC APIs, where server binds functions and client call them.
  2. how batching can be implemented.

If we want to make this efficient, the batching implementation will need to move to C++.

@heiner
Copy link

heiner commented Feb 24, 2020

Regardless of the drawbacks of this specific implementation, it does have the benefit of being relatively simple so it might be worth having it as a PyTorch example? :)


def __getattr__(self, name):
def fn(*args, **kwargs):
return rpc_sync(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be clearer if we change this to rpc_async? then a single client can eventually trigger a batch, and we could increase the batch size. I suspect this example was created before async was available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants