Skip to content

Commit

Permalink
Adding mps support to base handler and regression test (#3048)
Browse files Browse the repository at this point in the history
* adding mps support to base handler and regression test

* fixed method

* mps support

* fix format

* changes to detection

* testing x86

* adding m1 check

* adding test cases

* adding test workflow

* modifiying tests

* removing python tests

* remove workflow

* removing test config file

* adding docs

* fixing spell check

* lint fix

---------

Co-authored-by: Ankith Gunapal <agunapal@ischool.Berkeley.edu>
  • Loading branch information
udaij12 and agunapal authored Apr 9, 2024
1 parent 8450a2e commit 89c5389
Show file tree
Hide file tree
Showing 6 changed files with 349 additions and 0 deletions.
129 changes: 129 additions & 0 deletions docs/apple_silicon_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Apple Silicon Support

## What is supported
* TorchServe CI jobs now include M1 hardware in order to ensure support, [documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) on github M1 hardware.
- [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)
- [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml)
* For [Docker](https://docs.docker.com/desktop/install/mac-install/) ensure Docker for Apple silicon is installed then follow [setup steps](https://github.com/pytorch/serve/tree/master/docker)

## Experimental Support

* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml.
* This is an experimental feature and NOT ALL models are guaranteed to work.
* Number of GPUs now reports GPUs on Apple Silicon

### Testing
* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices
* Models that have been tested and work: Resnet-18, Densenet161, Alexnet
* Models that have been tested and DO NOT work: MNIST


#### Example Resnet-18 Using MPS On Mac M1 Pro
```
serve % torchserve --start --model-store model_store_gen --models resnet-18=resnet-18.mar --ncs
Torchserve version: 0.10.0
Number of GPUs: 16
Number of CPUs: 10
Max heap size: 8192 M
Python executable: /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store:
Initial Models: resnet-18=resnet-18.mar
Log dir:
Metrics dir:
Netty threads: 0
Netty client threads: 0
Default workers per model: 16
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store:
CPP log config: N/A
Model config: N/A
024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2024-04-08T14:18:02,391 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: resnet-18.mar
2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model resnet-18
2024-04-08T14:18:02,699 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model resnet-18 loaded.
2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: resnet-18, count: 16
...
...
serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
...
{
"tabby": 0.40966302156448364,
"tiger_cat": 0.3467046618461609,
"Egyptian_cat": 0.1300288736820221,
"lynx": 0.02391958422958851,
"bucket": 0.011532187461853027
}
...
```
#### Conda Example

```
(myenv) serve % pip list | grep torch
torch 2.2.1
torchaudio 2.2.1
torchdata 0.7.1
torchtext 0.17.1
torchvision 0.17.1
(myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver
(myenv3) serve % pip list | grep torch
torch 2.2.1
torch-model-archiver 0.10.0b20240312
torch-workflow-archiver 0.2.12b20240312
torchaudio 2.2.1
torchdata 0.7.1
torchserve 0.10.0b20240312
torchtext 0.17.1
torchvision 0.17.1
(myenv3) serve % torchserve --start --ncs --models densenet161.mar --model-store ./model_store_gen/
Torchserve version: 0.10.0
Number of GPUs: 0
Number of CPUs: 10
Max heap size: 8192 M
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Initial Models: densenet161.mar
Netty threads: 0
Netty client threads: 0
Default workers per model: 10
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
CPP log config: N/A
Model config: N/A
System metrics command: default
...
2024-03-12T15:58:54,702 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model densenet161 loaded.
2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10
Model server started.
...
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
{
"tabby": 0.46661922335624695,
"tiger_cat": 0.46449029445648193,
"Egyptian_cat": 0.0661405548453331,
"lynx": 0.001292439759708941,
"plastic_bag": 0.00022909720428287983
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
import io.netty.handler.ssl.SslContext;
import io.netty.handler.ssl.SslContextBuilder;
import io.netty.handler.ssl.util.SelfSignedCertificate;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.reflect.Field;
import java.lang.reflect.Type;
import java.net.InetAddress;
Expand Down Expand Up @@ -835,6 +837,28 @@ private static int getAvailableGpu() {
for (String id : ids) {
gpuIds.add(Integer.parseInt(id));
}
} else if (System.getProperty("os.name").startsWith("Mac")) {
Process process = Runtime.getRuntime().exec("system_profiler SPDisplaysDataType");
int ret = process.waitFor();
if (ret != 0) {
return 0;
}

BufferedReader reader =
new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
if (line.contains("Chipset Model:") && !line.contains("Apple M1")) {
return 0;
}
if (line.contains("Total Number of Cores:")) {
String[] parts = line.split(":");
if (parts.length >= 2) {
return (Integer.parseInt(parts[1].trim()));
}
}
}
throw new AssertionError("Unexpected response.");
} else {
Process process =
Runtime.getRuntime().exec("nvidia-smi --query-gpu=index --format=csv");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,18 @@ public void testNoWorkflowState() throws ReflectiveOperationException, IOExcepti
workingDir + "/frontend/archive/src/test/resources/models",
configManager.getWorkflowStore());
}

@Test
public void testNumGpuM1() throws ReflectiveOperationException, IOException {
System.setProperty("tsConfigFile", "src/test/resources/config_test_env.properties");
ConfigManager.Arguments args = new ConfigManager.Arguments();
args.setModels(new String[] {"noop_v0.1"});
args.setSnapshotDisabled(true);
ConfigManager.init(args);
ConfigManager configManager = ConfigManager.getInstance();
String arch = System.getProperty("os.arch");
if (arch.equals("aarch64")) {
Assert.assertTrue(configManager.getNumberOfGpu() > 0);
}
}
}
168 changes: 168 additions & 0 deletions test/pytest/test_device_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
import os
import platform
import shutil
import tempfile
from pathlib import Path
from unittest.mock import patch

import pytest
import requests
import test_utils
from model_archiver import ModelArchiverConfig

CURR_FILE_PATH = Path(__file__).parent
REPO_ROOT_DIR = CURR_FILE_PATH.parent.parent
ROOT_DIR = os.path.join(tempfile.gettempdir(), "workspace")
REPO_ROOT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../../")
data_file_zero = os.path.join(REPO_ROOT, "test/pytest/test_data/0.png")
config_file = os.path.join(REPO_ROOT, "test/resources/config_token.properties")
mnist_scriptes_py = os.path.join(REPO_ROOT, "examples/image_classifier/mnist/mnist.py")

HANDLER_PY = """
from ts.torch_handler.base_handler import BaseHandler
class deviceHandler(BaseHandler):
def initialize(self, context):
super().initialize(context)
assert self.get_device().type == "mps"
"""

MODEL_CONFIG_YAML = """
#frontend settings
# TorchServe frontend parameters
minWorkers: 1
batchSize: 4
maxWorkers: 4
"""

MODEL_CONFIG_YAML_GPU = """
#frontend settings
# TorchServe frontend parameters
minWorkers: 1
batchSize: 4
maxWorkers: 4
deviceType: "gpu"
"""

MODEL_CONFIG_YAML_CPU = """
#frontend settings
# TorchServe frontend parameters
minWorkers: 1
batchSize: 4
maxWorkers: 4
deviceType: "cpu"
"""


@pytest.fixture(scope="module")
def model_name():
yield "mnist"


@pytest.fixture(scope="module")
def work_dir(tmp_path_factory, model_name):
return Path(tmp_path_factory.mktemp(model_name))


@pytest.fixture(scope="module")
def model_config_name(request):
def get_config(param):
if param == "cpu":
return MODEL_CONFIG_YAML_CPU
elif param == "gpu":
return MODEL_CONFIG_YAML_GPU
else:
return MODEL_CONFIG_YAML

return get_config(request.param)


@pytest.fixture(scope="module", name="mar_file_path")
def create_mar_file(work_dir, model_archiver, model_name, model_config_name):
mar_file_path = work_dir.joinpath(model_name + ".mar")

model_config_yaml_file = work_dir / "model_config.yaml"
model_config_yaml_file.write_text(model_config_name)

model_py_file = work_dir / "model.py"

model_py_file.write_text(mnist_scriptes_py)

handler_py_file = work_dir / "handler.py"
handler_py_file.write_text(HANDLER_PY)

config = ModelArchiverConfig(
model_name=model_name,
version="1.0",
serialized_file=None,
model_file=mnist_scriptes_py, # model_py_file.as_posix(),
handler=handler_py_file.as_posix(),
extra_files=None,
export_path=work_dir,
requirements_file=None,
runtime="python",
force=False,
archive_format="default",
config_file=model_config_yaml_file.as_posix(),
)

with patch("archiver.ArgParser.export_model_args_parser", return_value=config):
model_archiver.generate_model_archive()

assert mar_file_path.exists()

yield mar_file_path.as_posix()

# Clean up files

mar_file_path.unlink(missing_ok=True)

# Clean up files


@pytest.fixture(scope="module", name="model_name")
def register_model(mar_file_path, model_store, torchserve):
"""
Register the model in torchserve
"""
shutil.copy(mar_file_path, model_store)

file_name = Path(mar_file_path).name

model_name = Path(file_name).stem

params = (
("model_name", model_name),
("url", file_name),
("initial_workers", "1"),
("synchronous", "true"),
("batch_size", "1"),
)

test_utils.reg_resp = test_utils.register_model_with_params(params)

yield model_name

test_utils.unregister_model(model_name)


@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
@pytest.mark.parametrize("model_config_name", ["gpu"], indirect=True)
def test_m1_device(model_name, model_config_name):
response = requests.get(f"http://localhost:8081/models/{model_name}")
assert response.status_code == 200, "Describe Failed"


@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
@pytest.mark.parametrize("model_config_name", ["cpu"], indirect=True)
def test_m1_device_cpu(model_name, model_config_name):
response = requests.get(f"http://localhost:8081/models/{model_name}")
assert response.status_code == 404, "Describe Worked"


@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
@pytest.mark.parametrize("model_config_name", ["default"], indirect=True)
def test_m1_device_default(model_name, model_config_name):
response = requests.get(f"http://localhost:8081/models/{model_name}")
assert response.status_code == 200, "Describe Failed"
12 changes: 12 additions & 0 deletions ts/torch_handler/base_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,11 +144,15 @@ def initialize(self, context):
self.model_yaml_config = context.model_yaml_config

properties = context.system_properties

if torch.cuda.is_available() and properties.get("gpu_id") is not None:
self.map_location = "cuda"
self.device = torch.device(
self.map_location + ":" + str(properties.get("gpu_id"))
)
elif torch.backends.mps.is_available() and properties.get("gpu_id") is not None:
self.map_location = "mps"
self.device = torch.device("mps")
elif XLA_AVAILABLE:
self.device = xm.xla_device()
else:
Expand Down Expand Up @@ -524,3 +528,11 @@ def describe_handle(self):
# pylint: disable=unnecessary-pass
pass
# pylint: enable=unnecessary-pass

def get_device(self):
"""Get device
Returns:
string : self device
"""
return self.device
2 changes: 2 additions & 0 deletions ts_scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1216,3 +1216,5 @@ libomp
rpath
venv
TorchInductor
Pytests
deviceType

0 comments on commit 89c5389

Please sign in to comment.