2023-08-31T08:04:53,920 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2023-08-31T08:04:53,920 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2023-08-31T08:04:54,015 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.6.0 TS Home: /opt/conda/lib/python3.10/site-packages Current directory: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler Temp directory: /tmp Metrics config path: /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml Number of GPUs: 1 Number of CPUs: 4 Max heap size: 3924 M Python executable: /opt/conda/bin/python3.10 Config file: config.properties Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Metrics address: http://127.0.0.1:8082 Model Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store Initial Models: N/A Log dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs Metrics dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.*|http(s)?://.*] Custom python dependency for model allowed: false Metrics report format: prometheus Enable metrics API: true Workflow Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store CPP log config: N/A Model config: N/A 2023-08-31T08:04:54,015 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.6.0 TS Home: /opt/conda/lib/python3.10/site-packages Current directory: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler Temp directory: /tmp Metrics config path: /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml Number of GPUs: 1 Number of CPUs: 4 Max heap size: 3924 M Python executable: /opt/conda/bin/python3.10 Config file: config.properties Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Metrics address: http://127.0.0.1:8082 Model Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store Initial Models: N/A Log dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs Metrics dir: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.*|http(s)?://.*] Custom python dependency for model allowed: false Metrics report format: prometheus Enable metrics API: true Workflow Store: /home/ubuntu/serve/cpp/test/resources/torchscript_model/babyllama/llm_handler/model_store CPP log config: N/A Model config: N/A 2023-08-31T08:04:54,022 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2023-08-31T08:04:54,022 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2023-08-31T08:04:54,065 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2023-08-31T08:04:54,065 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2023-08-31T08:04:54,126 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2023-08-31T08:04:54,126 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2023-08-31T08:04:54,127 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2023-08-31T08:04:54,127 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081 2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081 2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2023-08-31T08:04:54,128 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2023-08-31T08:04:54,129 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 2023-08-31T08:04:54,129 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 2023-08-31T08:04:54,580 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads. 2023-08-31T08:04:54,580 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads. 2023-08-31T08:04:54,853 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:50.0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,854 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:6.642848968505859|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:138.67653274536133|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:95.4|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,855 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:13814.33203125|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1534.94921875|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:04:54,856 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:12.0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469094 2023-08-31T08:05:01,243 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llm 2023-08-31T08:05:01,243 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llm 2023-08-31T08:05:01,244 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llm 2023-08-31T08:05:01,244 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llm 2023-08-31T08:05:01,244 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model llm loaded. 2023-08-31T08:05:01,244 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model llm loaded. 2023-08-31T08:05:01,245 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: llm, count: 1 2023-08-31T08:05:01,245 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: llm, count: 1 2023-08-31T08:05:01,251 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - LSP startWorker 2023-08-31T08:05:01,251 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - LSP startWorker 2023-08-31T08:05:01,255 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/opt/conda/lib/python3.10/site-packages/ts/cpp/bin/model_worker_socket, --sock_type, unix, --sock_name, /tmp/.ts.sock.9000, --runtime_type, LSP, --model_dir, /tmp/models/c9739d86c166413a92c14933ec10ff74, --logger_config_path, /opt/conda/lib/python3.10/site-packages/ts/cpp/resources/logging.config, --metrics_config_path, /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml] 2023-08-31T08:05:01,255 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/opt/conda/lib/python3.10/site-packages/ts/cpp/bin/model_worker_socket, --sock_type, unix, --sock_name, /tmp/.ts.sock.9000, --runtime_type, LSP, --model_dir, /tmp/models/c9739d86c166413a92c14933ec10ff74, --logger_config_path, /opt/conda/lib/python3.10/site-packages/ts/cpp/resources/logging.config, --metrics_config_path, /opt/conda/lib/python3.10/site-packages/ts/configs/metrics.yaml] 2023-08-31T08:05:01,680 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.679240 15342 model_worker.cc:43] Listening on port: /tmp/.ts.sock.9000 2023-08-31T08:05:01,681 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680782 15342 model_worker.cc:65] Binding to unix socket 2023-08-31T08:05:01,681 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680859 15342 model_worker.cc:90] Socket bind successful 2023-08-31T08:05:01,682 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680874 15342 model_worker.cc:91] [PID]15342 2023-08-31T08:05:01,682 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.680879 15342 model_worker.cc:92] Torch worker started. 2023-08-31T08:05:01,683 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change null -> WORKER_STARTED 2023-08-31T08:05:01,683 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change null -> WORKER_STARTED 2023-08-31T08:05:01,687 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2023-08-31T08:05:01,687 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2023-08-31T08:05:01,692 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.691966 15342 model_worker.cc:101] Connection accepted: /tmp/.ts.sock.9000 2023-08-31T08:05:01,693 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.692874 15342 model_worker.cc:119] Handle connection 2023-08-31T08:05:01,695 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469101695 2023-08-31T08:05:01,695 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469101695 2023-08-31T08:05:01,726 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:01.726074 15342 model_worker.cc:139] LOAD request received 2023-08-31T08:05:01,952 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 226 2023-08-31T08:05:01,952 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 226 2023-08-31T08:05:01,952 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED 2023-08-31T08:05:01,952 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED 2023-08-31T08:05:01,953 [INFO ] W-9000-llm_1.0 TS_METRICS - W-9000-llm_1.0.ms:704|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469101 2023-08-31T08:05:01,953 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerThreadTime.ms:32|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469101 2023-08-31T08:05:01,960 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:59786 "POST /models?initial_workers=1&url=llm.mar HTTP/1.1" 200 1535 2023-08-31T08:05:01,960 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469100 2023-08-31T08:05:07,015 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469107015 2023-08-31T08:05:07,015 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1693469107015 2023-08-31T08:05:07,016 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0831 08:05:07.016162 15342 model_worker.cc:124] INFER request received 2023-08-31T08:05:08,322 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Achieved tok per sec: 172.305 2023-08-31T08:05:08,322 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Generated String: Hello my name is Daisy. Daisy is three years old. She loves to play with her toys. 2023-08-31T08:05:08,322 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - One day, Daisy's mommy said, "Daisy, it's time to go to the store." Daisy was so excited! She ran to the store with her mommy. 2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - At the store, Daisy saw a big, red balloon. She wanted it so badly! She asked her mommy, "Can I have the balloon, please?" 2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Mommy said, "No, Daisy. We don't have enough money for that balloon." 2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Daisy was sad. She wanted the balloon so much. She started to cry. 2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Mommy said, "Daisy, don't cry. We can get the balloon. We can buy it and take it home." 2023-08-31T08:05:08,323 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - Daisy smiled. She was so happy. She hugged her mommy and said, "Thank you, mommy!" 2023-08-31T08:05:08,324 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - 2023-08-31T08:05:08,325 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - 2023-08-31T08:05:08,325 [INFO ] W-9000-llm_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:1308.160384|#ModelName:llm,Level:Model|#hostname:ip-172-31-12-16,requestID:acbec9f6-9866-45a5-84b0-ecd8bad95e35,timestamp:1693469108 2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:1308.160384|#ModelName:llm,Level:Model|#hostname:ip-172-31-12-16,requestID:acbec9f6-9866-45a5-84b0-ecd8bad95e35,timestamp:1693469108 2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1310 2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1310 2023-08-31T08:05:08,326 [INFO ] W-9000-llm_1.0 ACCESS_LOG - /127.0.0.1:52884 "PUT /predictions/llm HTTP/1.1" 200 1316 2023-08-31T08:05:08,327 [INFO ] W-9000-llm_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469100 2023-08-31T08:05:08,327 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.job.Job - Waiting time ns: 138008, Backend time ns: 1311950621 2023-08-31T08:05:08,327 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.job.Job - Waiting time ns: 138008, Backend time ns: 1311950621 2023-08-31T08:05:08,327 [INFO ] W-9000-llm_1.0 TS_METRICS - QueueTime.ms:0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469108 2023-08-31T08:05:08,327 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerThreadTime.ms:2|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469108 2023-08-31T08:05:54,565 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads. 2023-08-31T08:05:54,565 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - Parse metrics failed: NumExpr defaulting to 4 threads. 2023-08-31T08:05:54,829 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:25.0|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,830 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:6.597530364990234|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,830 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:138.72185134887695|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,830 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:95.5|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:1.0872395833333333|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:167|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,831 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:13479.09375|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,832 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1860.19921875|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154 2023-08-31T08:05:54,832 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:14.1|#Level:Host|#hostname:ip-172-31-12-16,timestamp:1693469154