Added a simple inference webserver #353

enn-nafnlaus · 2023-08-08T21:11:55Z

Usage example:

accelerate launch scripts/finetune.py summarize.yaml --inference --base_model=path/to/my/model --load_in_8bit=True --server --server_port 1567 --server_addr 127.0.0.1

Then in another terminal:

curl -X POST -d "$(cat test_text.txt)" http://localhost:1567/

utensil · 2023-08-09T04:25:15Z

Is it possible to allow submitting inference request during training? Just like running a manual eval.

enn-nafnlaus · 2023-08-09T09:34:24Z

Is it possible to allow submitting inference request during training? Just like running a manual eval.

Could always run a separate server while training.

utensil · 2023-08-09T12:45:48Z

Is it possible to allow submitting inference request during training? Just like running a manual eval.

Could always run a separate server while training.

The idea is to fully integrated into training. The server accepts requests and put it into a queue, the model being trained is register with a callback to dequeue the request and inference it, just as running evals during training.

In the case, we can reuse the model in training to do live inference at will.

I admit that this might be a separate PR.

enn-nafnlaus · 2023-08-09T13:18:12Z

I agree, that's a separate PR. :) This is a very simple lightweight server, just a modification of the existing inference code to get instructions from HTTP requests rather than the terminal, and return them to the requesting client rather than the screen. If you want to take the time to configure it to run training and inference through a queue, go ahead :)

I implemented it because to run test inferences, you either have to do it completely manually at present (typing / pasting into a terminal for each one), or you have a long overhead of waiting for the inference server to start up for every single test. By starting up a HTTP server and accepting requests on it, you can automatically run many different tests without a ton of overhead. Or use it for non-testing / production purposes, for that matter.

utensil · 2023-08-09T14:08:30Z

Yes, currently it's not able to reuse the loaded model to inference a batch of inputs, very inconvenient.

winglian · 2023-08-09T20:03:43Z

scripts/finetune.py

+        if not instruction:
+            response = ""
+        else:
+            default_tokens = {"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>"}


we could probably grab these from the tokenizer, as any special tokens defined in the config are added to the tokenizer when it is instantiated.

Patches welcome. :) Again, this is just redoing a copy of the preexisting inference section as a webserver. I didn't change the logic.

winglian · 2023-08-15T04:19:12Z

scripts/finetune.py

+            server_address = (cfg.server_addr, cfg.server_port)
+            httpd = socketserver.TCPServer(server_address, lambda *args, **kwargs: HttpHandler(*args, cfg=cfg, prompter=prompter, tokenizer=tokenizer, model=model, **kwargs))
+            print(f"Server running on port {cfg.server_port}")
+            httpd.serve_forever()


should this be explicitly killed at the end of training?

Can you train and run inference at the same time? This only runs if you're in --inference mode.

Stillerman · 2023-11-02T17:54:40Z

Might be able to use Gradio as a web server #812

Adding webserver

d0cfab6

winglian reviewed Aug 9, 2023

View reviewed changes

winglian reviewed Aug 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a simple inference webserver #353

Added a simple inference webserver #353

enn-nafnlaus commented Aug 8, 2023

utensil commented Aug 9, 2023 •

edited

Loading

enn-nafnlaus commented Aug 9, 2023

utensil commented Aug 9, 2023 •

edited

Loading

enn-nafnlaus commented Aug 9, 2023 •

edited

Loading

utensil commented Aug 9, 2023

winglian Aug 9, 2023 •

edited

Loading

enn-nafnlaus Aug 9, 2023

winglian Aug 15, 2023

enn-nafnlaus Aug 15, 2023

Stillerman commented Nov 2, 2023

Added a simple inference webserver #353

Are you sure you want to change the base?

Added a simple inference webserver #353

Conversation

enn-nafnlaus commented Aug 8, 2023

utensil commented Aug 9, 2023 • edited Loading

enn-nafnlaus commented Aug 9, 2023

utensil commented Aug 9, 2023 • edited Loading

enn-nafnlaus commented Aug 9, 2023 • edited Loading

utensil commented Aug 9, 2023

winglian Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

enn-nafnlaus Aug 9, 2023

Choose a reason for hiding this comment

winglian Aug 15, 2023

Choose a reason for hiding this comment

enn-nafnlaus Aug 15, 2023

Choose a reason for hiding this comment

Stillerman commented Nov 2, 2023

utensil commented Aug 9, 2023 •

edited

Loading

utensil commented Aug 9, 2023 •

edited

Loading

enn-nafnlaus commented Aug 9, 2023 •

edited

Loading

winglian Aug 9, 2023 •

edited

Loading