tool for sending concurrent requests and observing stream outputs
Usage: main [OPTIONS] <URL> <PAYLOAD_FILE>
Arguments:
<URL> URL of the server
<PAYLOAD_FILE> Path to the payload file
Options:
-u, --vus <VUS> Number of virtual users [default: 1]
-d, --duration <DURATION> Duration of the test [default: 30]
-s, --spawn-rate <SPAWN_RATE> Spawn rate of virtual users [default: 1]
-h, --help Print help
-V, --version Print version
example: request to llama.cpp server with 5 virtual users
cargo run -r -- http://localhost:8080/completion examples/llamacpp.jsonl -u 5
-
llama.cpp server configuration 1
./server -m tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
a1.mov
-
llama.cpp server configuration 2
./server -m tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -np 10
a2.mov
-
llama.cpp server configuration 3
./server -m tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -np 10 -cb
a3.mov