Accelerate not installed in Docker Image #1106

aowen14 · 2024-08-20T01:39:00Z

Describe the issue as clearly as possible:

Not sure if this is a bug or a feature request. But accelerate apparently isn't installed in the docker image. Which means that one can either use transformers with no GPU acceleration, or vLLM. vLLM currently doesn't have feature parity with transformers from what I can tell (like generate.json()).

Running the code outside of the image with the library + accelerate works.pip install accelerate in the container works to solve the issue as well, and it seems like the marginal download was very small.

Steps/code to reproduce the bug:

#within an outlines docker container

from outlines import models

model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")

Expected result:

The model should load and then be usable with the outlines sdk, with transformers and sending the model to a gpu, like `"cuda:0"`

Error message:

Traceback (most recent call last):
  File "/outlines/Performance-Benchmarking/outlines_local_examples.py", line 81, in <module>
    model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")
  File "/usr/local/lib/python3.10/site-packages/outlines/models/transformers.py", line 430, in transformers
    model = model_class.from_pretrained(model_name, **model_kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3296, in from_pretrained
    raise ImportError(
ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

Outlines/Python version information:

Docker Image Version Hash: 98c8512bd46f

Version information

``` 0.1.dev1+g8e94488.d20240816 Python 3.10.14 (main, Aug 13 2024, 02:10:16) [GCC 12.2.0] aiohappyeyeballs==2.3.6 aiohttp==3.10.3 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==24.2.0 certifi==2024.7.4 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==3.0.0 cmake==3.30.2 datasets==2.21.0 dill==0.3.8 diskcache==5.6.3 distro==1.9.0 exceptiongroup==1.2.2 fastapi==0.112.1 filelock==3.15.4 frozenlist==1.4.1 fsspec==2024.6.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.24.5 idna==3.7 interegular==0.3.3 Jinja2==3.1.4 jiter==0.5.0 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 lark==1.2.2 llvmlite==0.43.0 lm-format-enforcer==0.10.1 MarkupSafe==2.1.5 mpmath==1.3.0 msgpack==1.0.8 multidict==6.0.5 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.3 ninja==1.11.1.1 numba==0.60.0 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==12.560.30 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.6.20 nvidia-nvtx-cu12==12.1.105 openai==1.41.0 outlines @ file:///outlines packaging==24.1 pandas==2.2.2 pillow==10.4.0 prometheus-fastapi-instrumentator==7.0.0 prometheus_client==0.20.0 protobuf==5.27.3 psutil==6.0.0 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==17.0.0 pycountry==24.6.1 pydantic==2.8.2 pydantic_core==2.20.1 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.1 PyYAML==6.0.2 ray==2.34.0 referencing==0.35.1 regex==2024.7.24 requests==2.32.3 rpds-py==0.20.0 safetensors==0.4.4 sentencepiece==0.2.0 six==1.16.0 sniffio==1.3.1 starlette==0.38.2 sympy==1.13.2 tiktoken==0.7.0 tokenizers==0.19.1 torch==2.3.0 torchvision==0.18.0 tqdm==4.66.5 transformers==4.44.0 triton==2.3.0 typing_extensions==4.12.2 tzdata==2024.1 urllib3==2.2.2 uvicorn==0.30.6 uvloop==0.20.0 vllm==0.5.1 vllm-flash-attn==2.5.9 watchfiles==0.23.0 websockets==12.0 xformers==0.0.26.post1 xxhash==3.4.1 yarl==1.9.4 ```

Context for the issue:

I'm trying to write a post on using Outlines with Vast, and Vast needs everything to be based in a docker container to run, it would be great if users could start their workloads in the container without needing to install accelerate first.

The text was updated successfully, but these errors were encountered:

rlouf · 2024-08-20T10:49:28Z

Thank you, happy to review a PR!

lapp0 · 2024-08-20T15:49:12Z

outlines.serve should support json https://outlines-dev.github.io/outlines/reference/serve/vllm/#querying-endpoint

Additionally, outlines.models.vllm supports json as well. Could you please clarify the issue you ran into when trying this?

aowen14 · 2024-08-21T21:42:34Z

@rlouf I would be happy to create a PR for the docker setup, first I want to fully answer @lapp0's question, as this might be important for why I would like to use accelerate. I would prefer to use vLLM.

I created a simple pydantic use case for vllm, transformers and serve, this is what the code was for each, and what was output. Since running these, I added params to the vLLM example and it started returning valid json, but I was expecting it to work out of the box as vLLM has default parameters + outlines should be restricting to correct schema(?)

Server Call Code:

import requests
from pydantic import BaseModel

# Define the Book model
class Book(BaseModel):
    title: str
    author: str
    year: int

# Define the request parameters
ip_address = "localhost"
port = "8000"
prompt = "Create a book entry with the fields title, author, and year"
schema = Book.model_json_schema()

# Create the request body
outlines_request = {
    "prompt": prompt,
    "schema": schema
}

print("Prompt: ", prompt)
# Make the API call
response = requests.post(f"http://{ip_address}:{port}/generate/", json=outlines_request)

# Check if the request was successful
if response.status_code == 200:
    result = response.json()
    print("Result:", result["text"])
else:
    print(f"Error: {response.status_code}, {response.text}")

Server command:
python -m outlines.serve.serve --model="microsoft/Phi-3-mini-128k-instruct" --max-model-len 5000

Output:

Prompt:  Create a book entry with the fields title, author, and year
Result: ['Create a book entry with the fields title, author, and year{ "title": "The Great Gatsby", "author": "F']

VLLM Code:

from outlines import models, generate
from pydantic import BaseModel
from vllm import SamplingParams



class Book(BaseModel):
    title: str
    author: str
    year: int

print("\n\npydantic_vllm_example\n\n")


model = models.vllm("microsoft/Phi-3-mini-128k-instruct", max_model_len= 25000)
params = SamplingParams(temperature=0, top_k=-1)



generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result)

Output:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/main.py", line 1160, in parse_raw
[rank0]:     obj = parse.load_str_bytes(
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/deprecated/parse.py", line 49, in load_str_bytes
[rank0]:     return json_loads(b)  # type: ignore
[rank0]:            ^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/__init__.py", line 346, in loads
[rank0]:     return _default_decoder.decode(s)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/decoder.py", line 337, in decode
[rank0]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/decoder.py", line 353, in raw_decode
[rank0]:     obj, end = self.scan_once(s, idx)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 42 (char 41)

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]:   File "<frozen runpy>", line 88, in _run_code
[rank0]:   File "/home/lambda1/AlexCode/Performance-Benchmarking/outlines_local_vllm.py", line 27, in <module>
[rank0]:     result = generator(prompt, sampling_params=params)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/api.py", line 511, in __call__
[rank0]:     return format(completions)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/api.py", line 497, in format
[rank0]:     return self.format_sequence(sequences)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/json.py", line 50, in <lambda>
[rank0]:     generator.format_sequence = lambda x: schema_object.parse_raw(x)
[rank0]:                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/main.py", line 1187, in parse_raw
[rank0]:     raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
[rank0]: pydantic_core._pydantic_core.ValidationError: 1 validation error for Book
[rank0]: __root__
[rank0]:   Unterminated string starting at: line 1 column 42 (char 41) [type=value_error.jsondecode, input_value='{ "title": "The Great Gatsby", "author": "F', input_type=str]

Transformers Code:

from outlines import models, generate
from pydantic import BaseModel
from outlines.samplers import greedy


class Book(BaseModel):
    title: str
    author: str
    year: int

model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")
print("\n\npydantic_transformers_example\n\n")
generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result)

Output:

Prompt: Create a book entry with the fields title, author, and year
Result: title='Invisible Cities' author='Italo Calvino' year=1974

lapp0 · 2024-08-27T18:59:39Z

I'll look into the bug with json handling in vLLM.

aowen14 added the bug label Aug 20, 2024

lapp0 linked a pull request Aug 31, 2024 that will close this issue

WIP: Fix Various JSON-Schema Generation Bugs lapp0/outlines#88

Open

lapp0 linked a pull request Sep 16, 2024 that will close this issue

Fix Infinite Repetition in JSON Schemas Using Integer and String #1154

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate not installed in Docker Image #1106

Accelerate not installed in Docker Image #1106

aowen14 commented Aug 20, 2024

rlouf commented Aug 20, 2024

lapp0 commented Aug 20, 2024

aowen14 commented Aug 21, 2024

lapp0 commented Aug 27, 2024

Accelerate not installed in Docker Image #1106

Accelerate not installed in Docker Image #1106

Comments

aowen14 commented Aug 20, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

rlouf commented Aug 20, 2024

lapp0 commented Aug 20, 2024

aowen14 commented Aug 21, 2024

Server Call Code:

VLLM Code:

Transformers Code:

lapp0 commented Aug 27, 2024