Llamacpp with cpp backend #2527

shrinath-suresh · 2023-08-16T18:43:30Z

Description

Benchmarking LLM deployment with CPP Backend

Setup and Test

Follow the instructions from README.md to set up the environment
Download the TheBloke/Llama-2-7B-Chat-GGML model.

cd serve/cpp/./test/resources/torchscript_model/llm/llm_handler
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin

and update the path of the model in the script here.

To control the number of parameters to be generated, update the max_context_size variable in the script to the desired number.

Note: In the next version, this step will be changed to read the llm path from config.

Run the build

cd serve/cpp
./builld.sh

Once the build is successful libllm_handler.so shared object file would be generated in serve/cpp/./test/resources/torchscript_model/llm/llm_handler folder.

Copy the dummy.pt file to the llm_handler folder.
Move to llm_handler folder and run the following command to generate mar file

torch-model-archiver --model-name llm --version 1.0 --serialized-file dummy.pt --handler libllm_handler:LlmHandler --runtime LSP

Move the llm.mar to model_store

mkdir model_store
mv llm.mar model_store/llm.mar

Create a new config.properties file and past the content.

default_response_timeout=300000

The default timeout is 120000. When the context size is 512, LLM generation takes more time to complete the request in the single gpu machine.

Start the torchserve

torchserve --start --ncs --ts-config config.properties --model-store model_store/

Register the model using curl command

curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar"

Update the input in prompt.txt if needed and run

curl http://localhost:8080/predictions/llm -T prompt.txt

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>

…ml version Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>

mreso

Thanks for your contribution! Left a couple of comments.

mreso · 2023-09-14T10:05:40Z

cpp/src/examples/CMakeLists.txt

Would be good to create a CMakeLists.txt in the llamacpp directory and use add_subdirectory() in the main file to avoid the main one to getting too crowded.

mreso · 2023-09-14T10:08:50Z

cpp/src/examples/CMakeLists.txt

@@ -5,3 +5,26 @@ list(APPEND MNIST_SOURCE_FILES ${MNIST_SRC_DIR}/mnist_handler.cc)
 add_library(mnist_handler SHARED ${MNIST_SOURCE_FILES})
 target_include_directories(mnist_handler PUBLIC ${MNIST_SRC_DIR})
 target_link_libraries(mnist_handler PRIVATE ts_backends_torch_scripted ts_utils ${TORCH_LIBRARIES}) 
+
+set(LLM_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/llamacpp")
+set(LLAMACPP_SRC_DIR "/home/ubuntu/llama.cpp")


Good to avoid absolute paths. Is the file included in the PR? What is the license of llama.cpp? Do we need to include the license file?

mreso · 2023-09-14T10:09:29Z

cpp/src/examples/CMakeLists.txt

+target_link_libraries(llamacpp_handler PRIVATE ts_backends_torch_scripted ts_utils ${TORCH_LIBRARIES})
+
+
+set(MY_OBJECT_FILES


Where are the src files to these obj files?

mreso · 2023-09-14T10:10:50Z

cpp/src/examples/llamacpp/config.json

@@ -0,0 +1,5 @@
+{
+"checkpoint_path" : "/home/ubuntu/llama-2-7b-chat.Q4_0.gguf"


ditto also: How big is this file?

mreso · 2023-09-14T10:11:55Z