(nmmac) ➜ deepsparse git:(server_update) deepsparse.server --task text-generation --integration openai --model_path hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds 2023-12-06 16:35:52 deepsparse.server.server INFO Using config: ServerConfig(num_cores=None, num_workers=None, integration='openai', engine_thread_pinning='core', pytorch_num_threads=1, endpoints=[EndpointConfig(name='text_generation', route=None, task='text_generation', model='hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds', batch_size=1, logging_config=PipelineSystemLoggingConfig(enable=True, inference_details=SystemLoggingGroup(enable=False, target_loggers=[]), prediction_latency=SystemLoggingGroup(enable=True, target_loggers=[])), data_logging=None, bucketing=None, kwargs={})], loggers={}, system_logging=ServerSystemLoggingConfig(enable=True, request_details=SystemLoggingGroup(enable=False, target_loggers=[]), resource_utilization=SystemLoggingGroup(enable=False, target_loggers=[]))) 2023-12-06 16:35:52 deepsparse.loggers.build_logger INFO Created default logger: PythonLogger 2023-12-06 16:35:52 deepsparse.loggers.build_logger INFO System Logging: enabled for groups: ['text_generation/prediction_latency'] 2023-12-06 16:35:52 deepsparse.server.server INFO torch.set_num_threads(1) 2023-12-06 16:35:52 deepsparse.server.server INFO NM_BIND_THREADS_TO_CORES=1 2023-12-06 16:35:52 deepsparse.server.server INFO NM_BIND_THREADS_TO_SOCKETS=0 2023-12-06 16:35:52 deepsparse.server.openai_server INFO Initializing pipeline for 'text_generation' Fetching 10 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 114598.47it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2023-12-06 16:35:53 deepsparse.transformers.pipelines.text_generation INFO Compiling an auxiliary engine to process a prompt with a larger processing length. This improves performance, but may result in additional memory consumption. DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20231127 COMMUNITY | (b1e8d811) (release) (optimized) (system=neon, binary=neon) INFO: Started server process [19098] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:5543 (Press CTRL+C to quit) INFO: 127.0.0.1:50941 - "GET /v1/models HTTP/1.1" 200 OK 2023-12-06 16:37:16 deepsparse.server.openai_server INFO Received chat completion request: model='hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds' messages={'role': 'user', 'content': 'Talk about the Toronto Raptors.'} temperature=0.7 top_p=1.0 n=1 max_tokens=100 stop=[] stream=True presence_penalty=0.0 frequency_penalty=0.0 logit_bias=None user=None best_of=None top_k=-1 ignore_eos=False use_beam_search=False 2023-12-06 16:37:16 deepsparse.server.openai_server WARNING A dictionary message was found. This dictionary must be fastchat compliant. INFO: 127.0.0.1:50941 - "POST /v1/chat/completions HTTP/1.1" 200 OK /Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:377: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `-1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( 06/12/2023 16:37:16:945182 Identifier: text_generation/prediction_latency/total_inference_seconds | Category: system | Logged Data: 0.0027168330270797014 | Additional Info: {'pipeline_name': 'text_generation'} 06/12/2023 16:37:16:945245 Identifier: text_generation/prediction_latency/pre_process_seconds | Category: system | Logged Data: 0.0025280409900005907 | Additional Info: {'pipeline_name': 'text_generation'} 06/12/2023 16:37:16:945289 Identifier: text_generation/prediction_latency/engine_forward_seconds | Category: system | Logged Data: 0.0001368340163026005 | Additional Info: {'pipeline_name': 'text_generation'} 06/12/2023 16:37:16:945330 Identifier: text_generation/prediction_latency/post_process_seconds | Category: system | Logged Data: 7.292022928595543e-06 | Additional Info: {'pipeline_name': 'text_generation'} ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi result = await app( # type: ignore[func-returns-value] File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/fastapi/applications.py", line 289, in __call__ await super().__call__(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__ raise exc File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/base.py", line 109, in __call__ await response(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/responses.py", line 270, in __call__ async with anyio.create_task_group() as task_group: File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__ raise exceptions[0] File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/responses.py", line 273, in wrap await func() File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/base.py", line 134, in stream_response return await super().stream_response(send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/responses.py", line 262, in stream_response async for chunk in self.body_iterator: File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/base.py", line 98, in body_stream raise app_exc File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro await self.app(scope, receive_or_disconnect, send_no_error) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__ raise exc File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ await self.app(scope, receive, sender) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__ raise e File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__ await self.app(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__ await route.handle(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/routing.py", line 69, in app await response(scope, receive, send) File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/responses.py", line 280, in __call__ await self.background() File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/background.py", line 43, in __call__ await task() File "/Users/mgoin/venvs/nmmac/lib/python3.10/site-packages/starlette/background.py", line 26, in __call__ await self.func(*self.args, **self.kwargs) File "/Users/mgoin/code/deepsparse/src/deepsparse/server/openai_server.py", line 159, in abort_request await pipeline.abort(request_id) AttributeError: 'TextGenerationPipeline' object has no attribute 'abort'