Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model missing error - KServe - PyTorch #3215

Open
Csehpi opened this issue Jul 1, 2024 · 2 comments
Open

Model missing error - KServe - PyTorch #3215

Csehpi opened this issue Jul 1, 2024 · 2 comments

Comments

@Csehpi
Copy link

Csehpi commented Jul 1, 2024

🐛 Describe the bug

Hello,

I would like to ask your help.
I am using KServe and would like to deploy a PyTorch model with it.

My problem is that I am getting models missing error messages even if model-store is defined as a command line argument.

args:
          - torchserve
          - '--start'
          - '--model-store=/mnt/models/pytorch/model-store'
          - '--ts-config=/mnt/models/pytorch/config/config.properties'

The log says:

Model Store: /mnt/models/pytorch/model-store

BUT later it tries to use a different path (pytorch is missing from the path, the default one is used):

INFO:root:Copying contents of /mnt/models/model-store to local

The doc says: –model-store Overrides the model_store property in config.properties file

Thanks,
Peter

Error logs

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-07-01T07:56:21,363 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-07-01T07:56:21,365 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-07-01T07:56:21,404 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-07-01T07:56:21,456 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.11.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 3
Max heap size: 1536 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/pytorch/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/pytorch/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 3
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /mnt/models/pytorch/model-store
CPP log config: N/A
Model config: N/A
System metrics command: default
2024-07-01T07:56:21,462 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-07-01T07:56:21,475 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":1,"models":{"fashionmnist":{"1.0":{"defaultVersion":true,"marName":"fashionmnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"responseTimeout":120}}}}
2024-07-01T07:56:21,481 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-07-01T07:56:21,482 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
INFO:root:Wrapper: loading configuration from /mnt/models/pytorch/config/config.properties
INFO:root:Wrapper : Model names dict_keys(['fashionmnist']), inference address http://0.0.0.0:8085, management address http://0.0.0.0:8085, grpc_inference_address, 0.0.0.0:7070, model store /mnt/models/model-store
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v1
INFO:root:Copying contents of /mnt/models/model-store to local
Traceback (most recent call last):
  File "/home/model-server/kserve_wrapper/__main__.py", line 117, in <module>
    model.load()
  File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 159, in load
    raise ModelMissingError(model_path)
kserve.errors.ModelMissingError: <exception str() failed>

Installation instructions

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  ...
spec:
  predictor:
    containers:
      - args:
          - torchserve
          - '--start'
          - '--model-store=/mnt/models/pytorch/model-store'
          - '--ts-config=/mnt/models/pytorch/config/config.properties'
        env:
          - name: CONFIG_PATH
            value: /mnt/models/pytorch/config/config.properties
        image: pytorch/torchserve-kfs:0.11.0
        imagePullPolicy: Always
        name: kserve-container
        resources:
          limits:
            cpu: '3'
            memory: '6442450944'
          requests:
            cpu: '3'
            memory: '6442450944'
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
          - mountPath: /mnt/models
            name: kserve-provision-location
            readOnly: true
    imagePullSecrets:
      - name: ...
    initContainers:
      ...
    volumes:
      - emptyDir: {}
        name: kserve-provision-location

Model Packaging

pytorch (main folder)
|--config
│----config.properties
|--model-store
|----xyz.mar

config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
model_store=...
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"fashionmnist":{"1.0":{"defaultVersion":true,"marName":"fashionmnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"responseTimeout":120}}}}

Versions

PyTorch image: pytorch/torchserve-kfs:0.11.0
Kserve: 0.12.1

Repro instructions

Deployed as a KServe InferenceService

Possible Solution

No response

@glovass
Copy link

glovass commented Jul 10, 2024

We are facing the same issue. Have you found a solution for it?

@Csehpi
Copy link
Author

Csehpi commented Jul 10, 2024

Unfortunately no, I am still waiting some help / guidance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants