Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate not installed in Docker Image #1106

Open
aowen14 opened this issue Aug 20, 2024 · 4 comments · May be fixed by lapp0/outlines#88 or #1154
Open

Accelerate not installed in Docker Image #1106

aowen14 opened this issue Aug 20, 2024 · 4 comments · May be fixed by lapp0/outlines#88 or #1154
Labels

Comments

@aowen14
Copy link

aowen14 commented Aug 20, 2024

Describe the issue as clearly as possible:

Not sure if this is a bug or a feature request. But accelerate apparently isn't installed in the docker image. Which means that one can either use transformers with no GPU acceleration, or vLLM. vLLM currently doesn't have feature parity with transformers from what I can tell (like generate.json()).

Running the code outside of the image with the library + accelerate works.pip install accelerate in the container works to solve the issue as well, and it seems like the marginal download was very small.

Steps/code to reproduce the bug:

#within an outlines docker container

from outlines import models

model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")

Expected result:

The model should load and then be usable with the outlines sdk, with transformers and sending the model to a gpu, like `"cuda:0"`

Error message:

Traceback (most recent call last):
  File "/outlines/Performance-Benchmarking/outlines_local_examples.py", line 81, in <module>
    model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")
  File "/usr/local/lib/python3.10/site-packages/outlines/models/transformers.py", line 430, in transformers
    model = model_class.from_pretrained(model_name, **model_kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3296, in from_pretrained
    raise ImportError(
ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

Outlines/Python version information:

Docker Image Version Hash: 98c8512bd46f

Version information

``` 0.1.dev1+g8e94488.d20240816 Python 3.10.14 (main, Aug 13 2024, 02:10:16) [GCC 12.2.0] aiohappyeyeballs==2.3.6 aiohttp==3.10.3 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==24.2.0 certifi==2024.7.4 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==3.0.0 cmake==3.30.2 datasets==2.21.0 dill==0.3.8 diskcache==5.6.3 distro==1.9.0 exceptiongroup==1.2.2 fastapi==0.112.1 filelock==3.15.4 frozenlist==1.4.1 fsspec==2024.6.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.24.5 idna==3.7 interegular==0.3.3 Jinja2==3.1.4 jiter==0.5.0 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 lark==1.2.2 llvmlite==0.43.0 lm-format-enforcer==0.10.1 MarkupSafe==2.1.5 mpmath==1.3.0 msgpack==1.0.8 multidict==6.0.5 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.3 ninja==1.11.1.1 numba==0.60.0 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==12.560.30 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.6.20 nvidia-nvtx-cu12==12.1.105 openai==1.41.0 outlines @ file:///outlines packaging==24.1 pandas==2.2.2 pillow==10.4.0 prometheus-fastapi-instrumentator==7.0.0 prometheus_client==0.20.0 protobuf==5.27.3 psutil==6.0.0 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==17.0.0 pycountry==24.6.1 pydantic==2.8.2 pydantic_core==2.20.1 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.1 PyYAML==6.0.2 ray==2.34.0 referencing==0.35.1 regex==2024.7.24 requests==2.32.3 rpds-py==0.20.0 safetensors==0.4.4 sentencepiece==0.2.0 six==1.16.0 sniffio==1.3.1 starlette==0.38.2 sympy==1.13.2 tiktoken==0.7.0 tokenizers==0.19.1 torch==2.3.0 torchvision==0.18.0 tqdm==4.66.5 transformers==4.44.0 triton==2.3.0 typing_extensions==4.12.2 tzdata==2024.1 urllib3==2.2.2 uvicorn==0.30.6 uvloop==0.20.0 vllm==0.5.1 vllm-flash-attn==2.5.9 watchfiles==0.23.0 websockets==12.0 xformers==0.0.26.post1 xxhash==3.4.1 yarl==1.9.4 ```

Context for the issue:

I'm trying to write a post on using Outlines with Vast, and Vast needs everything to be based in a docker container to run, it would be great if users could start their workloads in the container without needing to install accelerate first.

@aowen14 aowen14 added the bug label Aug 20, 2024
@rlouf
Copy link
Member

rlouf commented Aug 20, 2024

Thank you, happy to review a PR!

@lapp0
Copy link
Contributor

lapp0 commented Aug 20, 2024

outlines.serve should support json https://outlines-dev.github.io/outlines/reference/serve/vllm/#querying-endpoint

Additionally, outlines.models.vllm supports json as well. Could you please clarify the issue you ran into when trying this?

@aowen14
Copy link
Author

aowen14 commented Aug 21, 2024

@rlouf I would be happy to create a PR for the docker setup, first I want to fully answer @lapp0's question, as this might be important for why I would like to use accelerate. I would prefer to use vLLM.

I created a simple pydantic use case for vllm, transformers and serve, this is what the code was for each, and what was output. Since running these, I added params to the vLLM example and it started returning valid json, but I was expecting it to work out of the box as vLLM has default parameters + outlines should be restricting to correct schema(?)

Server Call Code:

import requests
from pydantic import BaseModel

# Define the Book model
class Book(BaseModel):
    title: str
    author: str
    year: int

# Define the request parameters
ip_address = "localhost"
port = "8000"
prompt = "Create a book entry with the fields title, author, and year"
schema = Book.model_json_schema()

# Create the request body
outlines_request = {
    "prompt": prompt,
    "schema": schema
}

print("Prompt: ", prompt)
# Make the API call
response = requests.post(f"http://{ip_address}:{port}/generate/", json=outlines_request)

# Check if the request was successful
if response.status_code == 200:
    result = response.json()
    print("Result:", result["text"])
else:
    print(f"Error: {response.status_code}, {response.text}")

Server command:
python -m outlines.serve.serve --model="microsoft/Phi-3-mini-128k-instruct" --max-model-len 5000

Output:

Prompt:  Create a book entry with the fields title, author, and year
Result: ['Create a book entry with the fields title, author, and year{ "title": "The Great Gatsby", "author": "F']

VLLM Code:

from outlines import models, generate
from pydantic import BaseModel
from vllm import SamplingParams



class Book(BaseModel):
    title: str
    author: str
    year: int

print("\n\npydantic_vllm_example\n\n")


model = models.vllm("microsoft/Phi-3-mini-128k-instruct", max_model_len= 25000)
params = SamplingParams(temperature=0, top_k=-1)



generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result)

Output:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/main.py", line 1160, in parse_raw
[rank0]:     obj = parse.load_str_bytes(
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/deprecated/parse.py", line 49, in load_str_bytes
[rank0]:     return json_loads(b)  # type: ignore
[rank0]:            ^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/__init__.py", line 346, in loads
[rank0]:     return _default_decoder.decode(s)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/decoder.py", line 337, in decode
[rank0]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/decoder.py", line 353, in raw_decode
[rank0]:     obj, end = self.scan_once(s, idx)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 42 (char 41)

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]:   File "<frozen runpy>", line 88, in _run_code
[rank0]:   File "/home/lambda1/AlexCode/Performance-Benchmarking/outlines_local_vllm.py", line 27, in <module>
[rank0]:     result = generator(prompt, sampling_params=params)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/api.py", line 511, in __call__
[rank0]:     return format(completions)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/api.py", line 497, in format
[rank0]:     return self.format_sequence(sequences)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/json.py", line 50, in <lambda>
[rank0]:     generator.format_sequence = lambda x: schema_object.parse_raw(x)
[rank0]:                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/main.py", line 1187, in parse_raw
[rank0]:     raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
[rank0]: pydantic_core._pydantic_core.ValidationError: 1 validation error for Book
[rank0]: __root__
[rank0]:   Unterminated string starting at: line 1 column 42 (char 41) [type=value_error.jsondecode, input_value='{ "title": "The Great Gatsby", "author": "F', input_type=str]

Transformers Code:

from outlines import models, generate
from pydantic import BaseModel
from outlines.samplers import greedy


class Book(BaseModel):
    title: str
    author: str
    year: int

model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")
print("\n\npydantic_transformers_example\n\n")
generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result)

Output:

Prompt: Create a book entry with the fields title, author, and year
Result: title='Invisible Cities' author='Italo Calvino' year=1974

@lapp0
Copy link
Contributor

lapp0 commented Aug 27, 2024

I'll look into the bug with json handling in vLLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants