Skip to content

Commit

Permalink
Update docker ENTRYPOINT to ensure proper argument handling (outlines…
Browse files Browse the repository at this point in the history
…-dev#962)

## Summary

This PR updates the `ENTRYPOINT` instruction in the Dockerfile to ensure
that additional arguments passed to the container via `docker run` are
correctly appended to the entrypoint command.

### Before the change:

Parameter `model` is not passed to the entrypoint command and the
default model `facebook/opt-125m` is loaded instead.

```bash
> sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:45:46 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-125m)
```

### After the change:

Parameter `model` is correctly passed to the entrypoint command

```bash
> sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:59:17 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='microsoft/phi-2', speculative_config=None, tokenizer='microsoft/phi-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/phi-2)
```
  • Loading branch information
shashankmangla authored and fpgmaas committed Jun 14, 2024
1 parent 7656cab commit 0f9f854
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ RUN --mount=source=.git,target=.git,type=bind \
pip install --no-cache-dir .[serve]

# https://outlines-dev.github.io/outlines/reference/vllm/
ENTRYPOINT python3 -m outlines.serve.serve
ENTRYPOINT ["python3", "-m", "outlines.serve.serve"]

0 comments on commit 0f9f854

Please sign in to comment.