Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BabyLlama with CPP backend #2544

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
641a708
Baby Llama - Porting run.c for integration and fixed clang type conve…
shrinath-suresh Aug 25, 2023
016e4f1
Custom preprocess implementation
shrinath-suresh Aug 25, 2023
38d3e93
Free memory only after the inference is done
shrinath-suresh Aug 28, 2023
52a7927
Implement Postprocess
shrinath-suresh Aug 28, 2023
c675664
Setting Fast compiler option
shrinath-suresh Aug 31, 2023
374a2e8
Reading checkpoint path and tokenizer path from config file using folly
shrinath-suresh Sep 4, 2023
48f522c
Removing run.c from cmake
shrinath-suresh Sep 4, 2023
49a3015
Replace auto with appropriate data type
shrinath-suresh Sep 4, 2023
aeb1bb0
Using smartpointers and initializing the vector with appropriate size…
shrinath-suresh Sep 5, 2023
ee20424
Using smartpointers
shrinath-suresh Sep 5, 2023
f5d9799
Directly converting the tensor values to prompt token ids
shrinath-suresh Sep 5, 2023
9b3de26
Moving run.c and common variables to .cc file
shrinath-suresh Sep 5, 2023
3e0e2c3
Moving run.c to a separate folder
shrinath-suresh Sep 5, 2023
5c0495e
Uncommenting the original run.c main method
shrinath-suresh Sep 5, 2023
e75a5ae
Implemented destructor to free up resources
shrinath-suresh Sep 5, 2023
9afce52
Supporting files for unit test
shrinath-suresh Sep 5, 2023
0d12619
Processing all the batch inputs
shrinath-suresh Sep 6, 2023
bd03fd8
Setting InferenceMode guard
shrinath-suresh Sep 6, 2023
d2dc632
Updating InferenceMode to use torch::InferenceMode
shrinath-suresh Sep 12, 2023
67b46aa
Updating class name to BabyLlamaHandler
shrinath-suresh Sep 12, 2023
f30aab2
Renaming llm_handler target to babyllama_handler
shrinath-suresh Sep 12, 2023
7174cde
Adding dummy pt file
shrinath-suresh Sep 12, 2023
6dc025b
Typo Fix
shrinath-suresh Sep 13, 2023
450b85d
Calculate tokens/per second for batch input
shrinath-suresh Sep 14, 2023
8d279be
Adding README.md for babyllama example
shrinath-suresh Sep 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions cpp/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,10 @@ function build() {
mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
fi

if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
fi

cd $DEPS_DIR/../..
if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
$DEPS_DIR/../test/torchserve_cpp_test
Expand Down
9 changes: 9 additions & 0 deletions cpp/src/examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,12 @@ list(APPEND MNIST_SOURCE_FILES ${MNIST_SRC_DIR}/mnist_handler.cc)
add_library(mnist_handler SHARED ${MNIST_SOURCE_FILES})
target_include_directories(mnist_handler PUBLIC ${MNIST_SRC_DIR})
target_link_libraries(mnist_handler PRIVATE ts_backends_torch_scripted ts_utils ${TORCH_LIBRARIES})


set(BABYLLAMA_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/babyllama")
set(BABYLLAMA_SOURCE_FILES "")
list(APPEND BABYLLAMA_SOURCE_FILES ${BABYLLAMA_SRC_DIR}/baby_llama_handler.cc)
add_library(babyllama_handler SHARED ${BABYLLAMA_SOURCE_FILES})
target_include_directories(babyllama_handler PUBLIC ${BABYLLAMA_SRC_DIR})
target_link_libraries(babyllama_handler PRIVATE ts_backends_torch_scripted ts_utils ${TORCH_LIBRARIES})
target_compile_options(babyllama_handler PRIVATE -Wall -Wextra -Ofast)
84 changes: 84 additions & 0 deletions cpp/src/examples/babyllama/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
This example is adapted from https://github.com/karpathy/llama2.c

### Setup

1. Follow the instructions from [README.md](../../../README.md) to build the cpp backend
2. Download the model and tokenizer using the following command

```
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
```
Download the [tokenizer.bin](https://github.com/karpathy/llama2.c/blob/master/tokenizer.bin) file from the [llama2.c](https://github.com/karpathy/llama2.c) repo

3. Update [config.json](config.json) with the path of the downloaded model and tokenizer.

For example

```
{
"checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
"tokenizer_path" : "/home/ubuntu/serve/cpp/src/examples/babyllama/tokenizer.bin"
}
```

4. Run the build

```
cd serve/cpp
./builld.sh
```

Once the build is completed, `libbabyllama_handler.so` file is generated in the [babyllama_handler](../../../test/resources/torchscript_model/babyllama/babyllama_handler) folder

### Generate MAR file

Move to [babyllama_handler](../../../test/resources/torchscript_model/babyllama/babyllama_handler) folder and run the following command to generate mar file

```
torch-model-archiver --model-name llm --version 1.0 --serialized-file dummy.pt --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
```

Create model store directory and move the mar file

```
mkdir model_store
mv llm.mar model_store/llm.mar
```

### Inference

Start torchserve using the following command

```
torchserve --start --ncs --ts-config config.properties --model-store model_store/
```

Register the model using the following command

```
curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar&batch_size=2&max_batch_delay=5000&initial_workers=3"
```

Infer the model using the following command

```
curl http://localhost:8080/predictions/llm -T prompt.txt
```

This example supports batching. To run batch prediction, run the following command

```
curl http://localhost:8080/predictions/llm -T prompt.txt & curl http://localhost:8080/predictions/llm -T prompt1.txt &
```

Sample Response

```
Hello my name is Daisy. Daisy is three years old. She loves to play with her toys.
One day, Daisy's mommy said, "Daisy, it's time to go to the store." Daisy was so excited! She ran to the store with her mommy.
At the store, Daisy saw a big, red balloon. She wanted it so badly! She asked her mommy, "Can I have the balloon, please?"
Mommy said, "No, Daisy. We don't have enough money for that balloon."
Daisy was sad. She wanted the balloon so much. She started to cry.
Mommy said, "Daisy, don't cry. We can get the balloon. We can buy it and take it home."
Daisy smiled. She was so happy. She hugged her mommy and said, "Thank you, mommy!"
```
Loading
Loading