Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError with Tensors on Different Devices When Using Outlines #909

Open
ManilShrestha opened this issue May 20, 2024 · 1 comment
Open
Labels

Comments

@ManilShrestha
Copy link

Describe the issue as clearly as possible:

I've encountered an issue when attempting to use the outlines library with a model downloaded from the Hugging Face Hub (turboderp/Llama-3-8B-Instruct-exl2) and specifying the CUDA device to use. Despite setting the model to run on device=1, it seems that some operations are still trying to access tensors on device=0, leading to a runtime error.

Steps/code to reproduce the bug:

import outlines 
from huggingface_hub import snapshot_download
model_name="turboderp/Llama-3-8B-Instruct-exl2"
revision="3.0bpw"
model_directory = snapshot_download(repo_id=model_name, revision=revision, local_dir="llama3")

model = outlines.models.exl2(model_directory,device=1)

react_prompt = """
Question: How do you cook a sunny side egg?
FORMAT:
Strictly use the following format:
Thought: [insert thought]
Action: [Steps to follow]"""
generator = outlines.generate.text(model)
output = generator(react_prompt, stop_at="Action: ")
print(output)

Expected result:

Generation stops at "Action:" without RuntimeError

Error message:

RuntimeError                              Traceback (most recent call last)
Cell In[6], line 17
     10 react_prompt = """
     11 Question: How do you cook a sunny side egg?
     12 FORMAT:
     13 Strictly use the following format:
     14 Thought: [insert thought]
     15 Action: [Steps to follow]"""
     16 generator = outlines.generate.text(model)
---> 17 output = generator(react_prompt, stop_at="Action: ")
     18 print(output)

File ~/anaconda3/envs/ve-m/lib/python3.10/site-packages/outlines/generate/api.py:207, in SequenceGenerator.__call__(self, prompts, max_tokens, stop_at, rng)
    205 while True:
    206     try:
--> 207         last_state = next(states)
    208         if max_tokens or stop_sequences:
    209             token_ids = last_state.token_ids

File ~/anaconda3/envs/ve-m/lib/python3.10/site-packages/outlines/generate/generator.py:82, in sequence_generator(model, sampler, fsms, token_ids, sequence_weights, attention_masks, fsm_states, rng)
     80 allowed_tokens = get_allowed_tokens(fsms, fsm_states)
     81 biased_logits = bias_logits(logits, allowed_tokens)
---> 82 next_token_ids, ancestors, sequence_weights = sampler(
     83     biased_logits, sequence_weights, rng
     84 )
     86 token_ids = update_token_ids(token_ids, next_token_ids, ancestors)
     87 attention_masks = update_attention_masks(attention_masks, ancestors)

File ~/anaconda3/envs/ve-m/lib/python3.10/site-packages/outlines/samplers.py:160, in MultinomialSampler.__call__(self, next_token_logits, sequence_weights, rng)
    156 logprobs = torch.nn.functional.log_softmax(altered_next_token_logits, dim=-1)
    157 ancestors = torch.arange(
    158     altered_next_token_logits.shape[0], device=next_token_logits.device
    159 )
--> 160 weights = sequence_weights + torch.gather(logprobs, 1, next_token_ids).squeeze()
    162 return next_token_ids, ancestors, weights

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Outlines/Python version information:

Version information

python -c "from outlines import _version; print(_version.version)"
0.0.43.dev11+g78852b0

python -c "import sys; print('Python', sys.version)"
Python 3.10.13 (main, Sep 11 2023, 13:21:10) [GCC 11.2.0]

Context for the issue:

I want to utilize other GPUs in the server instead of 0, be able to specify the GPU

@ManilShrestha ManilShrestha changed the title RuntimeError with Tensors on Different Devices When Using Outlines with LLaMA Model RuntimeError with Tensors on Different Devices When Using Outlines May 20, 2024
ManilShrestha added a commit to ManilShrestha/outlines that referenced this issue May 20, 2024
@rlouf
Copy link
Member

rlouf commented May 22, 2024

Looks like we need to make sure sequence_weights and logprobs should be on the same device somewhere upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants