Skip to content

Commit

Permalink
[NeuralChat] Support Gaudi model parallelism serving (#802)
Browse files Browse the repository at this point in the history
  • Loading branch information
lvliang-intel authored Nov 29, 2023
1 parent fd74a9a commit 7f0090e
Show file tree
Hide file tree
Showing 18 changed files with 534 additions and 15 deletions.
1 change: 1 addition & 0 deletions intel_extension_for_transformers/neural_chat/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,7 @@ class LoadingModelConfig:
use_hpu_graphs: bool = False
use_cache: bool = True
use_deepspeed: bool = False
world_size: int = 1
ipex_int8: bool = False
use_llm_runtime: bool = False

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
This README is intended to guide you through setting up the backend for a text chatbot using the NeuralChat framework. You can deploy this text chatbot on various platforms, including Intel XEON Scalable Processors, Habana's Gaudi processors (HPU), Intel Data Center GPU and Client GPU, Nvidia Data Center GPU and Client GPU.

This textbot shows how to deploy chatbot backend on Habana's Gaudi processors (HPU).

# Setup Conda

First, you need to install and configure the Conda environment:

```shell
# Download and install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda*.sh
source ~/.bashrc
```

# Install Python dependencies

Install dependencies using pip

>**Note**: Please make sure transformers version is 4.34.1
```bash
pip install ../../../../../requirements_hpu.txt
pip install transformers==4.34.1
```

# Configure the textbot.yaml

You can customize the configuration file 'textbot.yaml' to match your environment setup. Here's a table to help you understand the configurable options:

| Item | Value |
| ------------------- | --------------------------------------- |
| host | 127.0.0.1 |
| port | 8000 |
| model_name_or_path | "meta-llama/Llama-2-7b-chat-hf" |
| device | "hpu" |
| use_deepspeed | true |
| world_size | 8 |
| tasks_list | ['textchat'] |



# Run the TextChat server
To start the TextChat server, use the following command:

```shell
nohup python run_text_chat.py &
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This is the parameter configuration file for NeuralChat Serving.

#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8000

model_name_or_path: "Phind/Phind-CodeLlama-34B-v2"
device: "hpu"
use_deepspeed: true
world_size: 8

# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune']
tasks_list: ['textchat']
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
This README is intended to guide you through setting up the backend for a text chatbot using the NeuralChat framework. You can deploy this text chatbot on various platforms, including Intel XEON Scalable Processors, Habana's Gaudi processors (HPU), Intel Data Center GPU and Client GPU, Nvidia Data Center GPU and Client GPU.

This textbot shows how to deploy chatbot backend on Intel XEON Scalable Processors.

# Setup Conda

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# !/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor

def main():
server_executor = NeuralChatServerExecutor()
server_executor(config_file="./textbot.yaml", log_file="./textbot.log")

if __name__ == "__main__":
main()
Loading

0 comments on commit 7f0090e

Please sign in to comment.