2023-08-31T08:04:53,920 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-08-31T08:04:53,920 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-08-31T08:04:54,015 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.6.0
TS Home: /opt/conda/lib/python3.10/site-packages
Current directory: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler
Temp directory: /tmp
Metrics config path: /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 4
Max heap size: 3924 M
Python executable: /opt/conda/bin/python3.10
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store
Initial Models: N/A
Log dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs
Metrics dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store
CPP log config: N/A
Model config: N/A
2023-08-31T08:04:54,015 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.6.0
TS Home: /opt/conda/lib/python3.10/site-packages
Current directory: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler
Temp directory: /tmp
Metrics config path: /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 4
Max heap size: 3924 M
Python executable: /opt/conda/bin/python3.10
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store
Initial Models: N/A
Log dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs
Metrics dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store
CPP log config: N/A
Model config: N/A
2023-08-31T08:04:54,022 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-08-31T08:04:54,022 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-08-31T08:04:54,065 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-08-31T08:04:54,065 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-08-31T08:04:54,126 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-08-31T08:04:54,126 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-08-31T08:04:54,127 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-08-31T08:04:54,127 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-08-31T08:04:54,129 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
2023-08-31T08:04:54,129 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
2023-08-31T08:04:54,580 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads.
2023-08-31T08:04:54,580 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads.
2023-08-31T08:04:54,853 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:50.0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,854 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:6.642848968505859|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:138.67653274536133|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:95.4|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:13814.33203125|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1534.94921875|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:12.0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094
2023-08-31T08:05:01,243 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llm
2023-08-31T08:05:01,243 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llm
2023-08-31T08:05:01,244 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llm
2023-08-31T08:05:01,244 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llm
2023-08-31T08:05:01,244 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model llm loaded.
2023-08-31T08:05:01,244 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model llm loaded.
2023-08-31T08:05:01,245 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: llm, count: 1
2023-08-31T08:05:01,245 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: llm, count: 1
2023-08-31T08:05:01,251 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - LSP startWorker
2023-08-31T08:05:01,251 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - LSP startWorker
2023-08-31T08:05:01,255 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/opt/conda/lib/python3.10/site-packages/ts/cpp/bin/model_worker_socket, --sock_type, unix, --sock_name, /tmp/.ts.sock.9000, --runtime_type, LSP, --model_dir, /tmp/models/c9739d86c166413a92c14933ec10ff74, --logger_config_path, /opt/conda/lib/python3.10/site-packages/ts/cpp/resources/logging.config, --metrics_config_path, /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2023-08-31T08:05:01,255 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/opt/conda/lib/python3.10/site-packages/ts/cpp/bin/model_worker_socket, --sock_type, unix, --sock_name, /tmp/.ts.sock.9000, --runtime_type, LSP, --model_dir, /tmp/models/c9739d86c166413a92c14933ec10ff74, --logger_config_path, /opt/conda/lib/python3.10/site-packages/ts/cpp/resources/logging.config, --metrics_config_path, /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2023-08-31T08:05:01,680 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.679240 15342 model_worker.cc:43] Listening on port: /tmp/.ts.sock.9000
2023-08-31T08:05:01,681 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680782 15342 model_worker.cc:65] Binding to unix socket
2023-08-31T08:05:01,681 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680859 15342 model_worker.cc:90] Socket bind successful
2023-08-31T08:05:01,682 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680874 15342 model_worker.cc:91] [PID]15342
2023-08-31T08:05:01,682 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680879 15342 model_worker.cc:92] Torch worker started.
2023-08-31T08:05:01,683 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change null -> WORKER_STARTED
2023-08-31T08:05:01,683 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change null -> WORKER_STARTED
2023-08-31T08:05:01,687 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2023-08-31T08:05:01,687 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2023-08-31T08:05:01,692 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.691966 15342 model_worker.cc:101] Connection accepted: /tmp/.ts.sock.9000
2023-08-31T08:05:01,693 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.692874 15342 model_worker.cc:119] Handle connection
2023-08-31T08:05:01,695 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469101695
2023-08-31T08:05:01,695 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469101695
2023-08-31T08:05:01,726 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.726074 15342 model_worker.cc:139] LOAD request received
2023-08-31T08:05:01,952 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 226
2023-08-31T08:05:01,952 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 226
2023-08-31T08:05:01,952 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-08-31T08:05:01,952 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-08-31T08:05:01,953 [INFO ] W-9000-llm_1.0 TS_METRICS - W-9000-llm_1.0.ms:704|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469101
2023-08-31T08:05:01,953 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerThreadTime.ms:32|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469101
2023-08-31T08:05:01,960 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:59786 "POST /models?initial_workers=1&url=llm.mar HTTP/1.1" 200 1535
2023-08-31T08:05:01,960 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469100
2023-08-31T08:05:07,015 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469107015
2023-08-31T08:05:07,015 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469107015
2023-08-31T08:05:07,016 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:07.016162 15342 model_worker.cc:124] INFER request received
2023-08-31T08:05:08,322 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Achieved tok per sec: 172.305
2023-08-31T08:05:08,322 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Generated String:  Hello my name is Daisy. Daisy is three years old. She loves to play with her toys.
2023-08-31T08:05:08,322 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - One day, Daisy's mommy said, "Daisy, it's time to go to the store." Daisy was so excited! She ran to the store with her mommy.
2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - At the store, Daisy saw a big, red balloon. She wanted it so badly! She asked her mommy, "Can I have the balloon, please?"
2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Mommy said, "No, Daisy. We don't have enough money for that balloon."
2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Daisy was sad. She wanted the balloon so much. She started to cry.
2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Mommy said, "Daisy, don't cry. We can get the balloon. We can buy it and take it home."
2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Daisy smiled. She was so happy. She hugged her mommy and said, "Thank you, mommy!"
2023-08-31T08:05:08,324 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - <s>
2023-08-31T08:05:08,325 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - 
2023-08-31T08:05:08,325 [INFO ] W-9000-llm_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:1308.160384|#ModelName:llm,Level:Model|#hostname:ip-172-31-12-16,requestID:acbec9f6-9866-45a5-84b0-ecd8bad95e35,timestamp:1693469108
2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:1308.160384|#ModelName:llm,Level:Model|#hostname:ip-172-31-12-16,requestID:acbec9f6-9866-45a5-84b0-ecd8bad95e35,timestamp:1693469108
2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1310
2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1310
2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0 ACCESS_LOG - /127.0.0.1:52884 "PUT /predictions/llm HTTP/1.1" 200 1316
2023-08-31T08:05:08,327 [INFO ] W-9000-llm_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469100
2023-08-31T08:05:08,327 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.job.Job - Waiting time ns: 138008, Backend time ns: 1311950621
2023-08-31T08:05:08,327 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.job.Job - Waiting time ns: 138008, Backend time ns: 1311950621
2023-08-31T08:05:08,327 [INFO ] W-9000-llm_1.0 TS_METRICS - QueueTime.ms:0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469108
2023-08-31T08:05:08,327 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerThreadTime.ms:2|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469108
2023-08-31T08:05:54,565 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads.
2023-08-31T08:05:54,565 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads.
2023-08-31T08:05:54,829 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:25.0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,830 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:6.597530364990234|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,830 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:138.72185134887695|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,830 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:95.5|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:1.0872395833333333|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:167|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:13479.09375|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,832 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1860.19921875|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154
2023-08-31T08:05:54,832 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:14.1|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154