Skip to content

Commit

Permalink
Add note on hf-token for llama3 model (#386)
Browse files Browse the repository at this point in the history
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
  • Loading branch information
sivanantha321 committed Aug 18, 2024
1 parent f991e85 commit 99b939a
Showing 1 changed file with 48 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,23 @@ In this example, We demonstrate how to deploy `Llama3 model` for text generation
KServe Hugging Face runtime by default uses vLLM to serve the LLM models for faster time-to-first-token(TTFT) and higher token generation throughput than the Hugging Face API. vLLM is implemented with common inference optimization techniques, such as paged attention, continuous batching and an optimized CUDA kernel.
If the model is not supported by vLLM, KServe falls back to HuggingFace backend as a failsafe.

!!! note
The Llama3 model requires huggingface hub token to download the model. You can set the token using `HF_TOKEN`
environment variable.

Create a secret with the Hugging Face token.

=== "Yaml"
```yaml
apiVersion: v1
kind: Secret
metadata:
name: hf-secret
type: Opaque
stringData:
HF_TOKEN: <token>
```

=== "Yaml"

```yaml
Expand All @@ -22,6 +39,13 @@ If the model is not supported by vLLM, KServe falls back to HuggingFace backend
args:
- --model_name=llama3
- --model_id=meta-llama/meta-llama-3-8b-instruct
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: HF_TOKEN
optional: false
resources:
limits:
cpu: "6"
Expand Down Expand Up @@ -150,6 +174,23 @@ curl -H "content-type:application/json" -H "Host: ${SERVICE_HOSTNAME}" \
You can use `--backend=huggingface` argument to perform the inference using Hugging Face API. KServe Hugging Face backend runtime also
supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for inference.

!!! note
The Llama3 model requires huggingface hub token to download the model. You can set the token using `HF_TOKEN`
environment variable.

Create a secret with the Hugging Face token.

=== "Yaml"
```yaml
apiVersion: v1
kind: Secret
metadata:
name: hf-secret
type: Opaque
stringData:
HF_TOKEN: <token>
```

=== "Yaml"

```yaml
Expand All @@ -167,6 +208,13 @@ supports the OpenAI `/v1/completions` and `/v1/chat/completions` endpoints for i
- --model_name=llama3
- --model_id=meta-llama/meta-llama-3-8b-instruct
- --backend=huggingface
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: HF_TOKEN
optional: false
resources:
limits:
cpu: "6"
Expand Down

0 comments on commit 99b939a

Please sign in to comment.