Skip to content

Commit

Permalink
Fix/llm launcher disable token (#3230)
Browse files Browse the repository at this point in the history
* Fix disable_token_auth api

* Move vllm dep to right place

* Fix llm deployment docs
  • Loading branch information
mreso committed Jul 5, 2024
1 parent 4573482 commit cbe9340
Show file tree
Hide file tree
Showing 5 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Refer to [torchserve docker](docker/README.md) for details.
#export token=<HUGGINGFACE_HUB_TOKEN>
docker build . -f docker/Dockerfile.llm -t ts/llm

docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token
docker run --rm -ti --shm-size 1g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth

curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"
```
Expand Down
4 changes: 2 additions & 2 deletions docs/llm_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ export token=<HUGGINGFACE_HUB_TOKEN>

You can then go ahead and launch a TorchServe instance serving your selected model:
```bash
docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token
docker run --rm -ti --shm-size 1g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth
```

To change the model you just need to exchange the identifier given to the `--model_id` parameter.
Expand All @@ -42,7 +42,7 @@ To rename the model endpoint from `predictions/model` to something else you can

The launcher script can also be used outside a docker container by calling this after installing TorchServe following the [installation instruction](https://github.com/pytorch/serve/blob/feature/single_cmd_llm_deployment/README.md#-quick-start-with-torchserve).
```bash
python -m ts.llm_launcher --disable_token
python -m ts.llm_launcher --disable_token_auth
```

Please note that the launcher script as well as the docker command will automatically run on all available GPUs so make sure to restrict the visible number of device by setting CUDA_VISIBLE_DEVICES.
Expand Down
1 change: 1 addition & 0 deletions requirements/common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ pynvml==11.5.0
pyyaml==6.0.1
ninja==1.11.1.1
setuptools
vllm==0.5.0; sys_platform == 'linux'
1 change: 0 additions & 1 deletion requirements/torch_linux.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,3 @@ torch==2.3.0+cpu; sys_platform == 'linux'
torchvision==0.18.0+cpu; sys_platform == 'linux'
torchtext==0.18.0; sys_platform == 'linux'
torchaudio==2.3.0+cpu; sys_platform == 'linux'
vllm==0.5.0; sys_platform == 'linux'
4 changes: 2 additions & 2 deletions ts/llm_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def main(args):
model_store=args.model_store,
no_config_snapshots=True,
models=args.model_name,
disable_token=args.disable_token,
disable_token=args.disable_token_auth,
)

pause()
Expand Down Expand Up @@ -134,7 +134,7 @@ def main(args):
)

parser.add_argument(
"--disable_token-auth",
"--disable_token_auth",
action="store_true",
help="Disable token authentication",
)
Expand Down

0 comments on commit cbe9340

Please sign in to comment.