Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tutorial for using torchserve on aws sagemaker #2671

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
238 changes: 238 additions & 0 deletions intermediate_source/torchserve_aws_sagemaker_tutorial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
TorchServe on AWS SageMaker
============================

In this tutorial you will learn how you can efficiently serve PyTorch models using Torchserve and AWS Sagemaker

Why TorchServe?
^^^^^^^^^^^^^^^^

| TorchServe is the recommended model server for PyTorch, preinstalled in the AWS PyTorch Deep Learning Container (DLC). This powerful tool offers customers a consistent and user-friendly experience, delivering high performance in deploying multiple PyTorch models across various AWS instances, including CPU, GPU, Neuron, and Graviton, regardless of the model size or distribution.
| TorchServe is easy to use. It comes with a convenient CLI to deploy locally and is easy to package into a container and scale out with Amazon SageMaker or Amazon EKS. With default handlers for common problems such as image classification, object detection, image segmentation, and text classification, you can deploy with just a few lines of code—no more writing lengthy service handlers for initialization, preprocessing, and post-processing. TorchServe is open-source, which means it's fully open and extensible to fit your deployment needs.

To get started on how to use TorchServe you can refer to this tutorial: `TorchServe QuickStart <https://pytorch.org/serve/getting_started.html>`_

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SageMaker has 2 different endpoints. Their deployment is slightly different. Please include this information.

  • single model
  • multi-model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the user manual links of using TorchServe on SM:

* https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-torchserve.html

* https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html

Hello lxning,
Thanks for the review, I have already added these links in the reference at the end of the tutorial.

The following table lists the AWS PyTorch DLCs supported by TorchServe
````````````````````````````````````````````````````````````````````````

.. list-table::
:header-rows: 1

* - Instance type
- SageMaker PyTorch DLC link
* - CPU and GPU
- `SageMaker PyTorch containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-containers-sm-support-only>`_
* - Neuron
- `PyTorch Neuron containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers>`_
* - Graviton
- `SageMaker PyTorch Graviton containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-graviton-containers-sm-support-only>`_

Using AWS SageMaker we can host single models and even multiple models on a single endpoint. Their procedure varies slightly, in this tutorial we will be focusing on single model endpoint. The essence remains the same with `some variations in the code. <https://docs.aws.amazon.com/sagemaker/latest/dg/create-multi-model-endpoint.html>`_

You can follow along with this tutorial through an Amazon EC2 instance, or your laptop or desktop. If you're using a local laptop or desktop, make sure you download and install the `AWS CLI <https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html>`_ and configure it, `AWS SDK for Python (boto3) <https://aws.amazon.com/sdk-for-python/>`_, and `Amazon SageMaker Python SDK <https://github.com/aws/sagemaker-python-sdk#installing-the-sagemaker-python-sdk>`_. After you deploy, the models are hosted on Amazon SageMaker fully managed deployment instances.

The code, configuration files, Jupyter notebooks, and Dockerfiles used in this post are available on `GitHub <https://github.com/shashankprasanna/torchserve-examples.git>`_. The steps in the following example are from the ``deploy_torchserve.ipynb`` Jupyter notebook.

Cloning the example repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To clone the example repository, enter the following code:

.. code:: shell

git clone https://github.com/shashankprasanna/torchserve-examples.git
cd torchserve-examples

Clone the TorchServe repository a nd install torch-model-archiver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use ``torch-model-archiver`` tool to create a model archive file. The .mar model archive file contains model checkpoints along with it's ``state_dict`` (dictionary object that maps each layer to its parameter tensor).

.. code:: shell

git clone https://github.com/pytorch/serve.git
pip install serve/model-archiver

To download a PyTorch model and create a TorchServe archive
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One can customise the handler by passing the `<custom_handler.py> <https://github.com/pytorch/serve/blob/master/docs/custom_service.md>`_ instead of ``image_classifier``

.. code:: shell

wget -q https://download.pytorch.org/models/densenet161-8d451a50.pth

export model_file_name='densenet161'

torch-model-archiver --model-name $model_file_name \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier
--archive-format tgz

Uploading the model to Amazon S3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To upload the model to Amazon S3, complete the following steps:

#. Create a boto3 session and get the Region and account information

.. code:: python3

import boto3, time, json
sess = boto3.Session()
sm = sess.client('sagemaker')
region = sess.region_name
account = boto3.client('sts').get_caller_identity().get('Account')

import sagemaker
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session(boto_session=sess)

Get the default Amazon SageMaker S3 bucket name

bucket_name = sagemaker_session.default_bucket()
prefix = 'torchserve'

``[Optional]`` The methods mentioned below are to be used only when we pass the flag `torchserve-model-archive --archive-format no-archive` in the step of creating torchserve archive. For large models we use `uncompressed model artifact feature <https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html>`_ of sagemaker which as of now is only available for single model endpoint use cases.

2. Create a compressed tar.gz file out of the densenet161.mar file, because Amazon SageMaker expects models to be in a tar.gz file.

.. code:: shell

tar cvfz $model_file_name.tar.gz densenet161.mar

#. Upload the model to your S3 bucket under the models directory.

.. code:: shell

aws s3 cp $model_file_name.tar.gz s3://{bucket_name}/{prefix}/model


We are doing the following two steps related to Docker Container because we want Amazon SageMaker to interact with our `custom Docker containers <https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-main.html>`_ but we can use `various AWS Sagemaker services <https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html>`_ which also satifies our use case of `bringing our own models <https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html#deploy-model-steps-byom>`_.

Creating an Amazon ECR registry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is needed ONLY if you are going to BYOD or BYOC, otherwise it is not needed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create a new Docker container registry for your TorchServe container images. Amazon SageMaker pulls the TorchServe container from this registry. See the following code:

.. code:: python3

registry_name = 'torchserve'

.. code:: shell

aws ecr create-repository --repository-name torchserve

Building a TorchServe Docker container and pushing it to Amazon ECR
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The repository for this post already contains a Dockerfile for building a TorchServe container. Build a Docker container image locally and push it to your Amazon ECR repository you created in the previous step. See the following code:

.. code:: python3

image_label = 'v1'
image = f'{account}.dkr.ecr.{region}.amazonaws.com/{registry_name}:{image_label}'

.. code:: shell

docker build -t {registry_name}:{image_label} .
$(aws ecr get-login --no-include-email --region {region})
docker tag {registry_name}:{image_label} {image}
docker push {image}

You get the following output confirming that the container was built and pushed to Amazon ECR successfully:

.. image:: static/torchserve_container_amazonECR.png
:alt: output when docker container was successfully built and pushed to Amazon ECR

Hosting an inference endpoint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are multiple ways to host an inference endpoint and make predictions. The quickest approach is to use the Amazon SageMaker Python SDK. However, if you're going to invoke the endpoint from a client application, you should use `Amazon SDK <https://aws.amazon.com/tools/>`_ for the language of your choice.

Hosting an inference endpoint and making predictions with Amazon SageMaker Python SDK

To host an inference endpoint and make predictions using Amazon SageMaker Python SDK, complete the following steps:

#. Create a model. The model function expects the name of the TorchServe container image and the location of your trained models. See the following code:

.. code:: python3

import sagemaker
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor
role = sagemaker.get_execution_role()

model_data = f's3://{bucket_name}/models/$model_file_name.tar.gz'
sm_model_name = 'torchserve-densenet161'

torchserve_model = Model(model_data = model_data,
image = image,
role = role,
predictor_cls=RealTimePredictor,
name = sm_model_name)

For more information about the model function, see `Model <https://sagemaker.readthedocs.io/en/stable/model.html>`_

#. On the Amazon SageMaker console, to see the model details, choose Models.

.. image:: static/torchserve_model_hosting_aws_sagemaker.png
:alt: image of aws sagemaker console showing model details

#. Deploy the model endpoint. Specify the instance type and number of instances you want Amazon SageMaker to run the container on. See the following code:

.. code:: python3

endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
predictor = torchserve_model.deploy(instance_type='ml.m4.xlarge',
initial_instance_count=1,
endpoint_name = endpoint_name)

You can also set it up to automatically scale based on metrics, such as the total number of invocations. For more information, see `Automatically Scale Amazon SageMaker Models <https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html>`_

#. On the Amazon SageMaker console, to see the hosted endpoint, choose Endpoints.

.. image:: static/torchserve_endpoint_aws_sagemaker.png
:alt: detail about endpoint on aws sagemaker console

#. Test the model with the following code:

.. code:: shell

wget -q https://s3.amazonaws.com/model-server/inputs/kitten.jpg

.. code:: python3

file_name = 'kitten.jpg'
with open(file_name, 'rb') as f:
payload = f.read()
payload = payload

response = predictor.predict(data=payload)
print(*json.loads(response), sep = '\n')

The following screenshot shows the output of invoking the model hosted by TorchServe. The model thinks the kitten in the image is either a tiger cat or a tabby cat.

.. image:: static/torchserve_model_output_aws_sagemaker.png
:alt: model's response corresponding to the payload image


If you're building applications such as mobile apps or webpages that need to invoke the TorchServe endpoint for getting predictions on new data, you can use Amazon API rather than the Amazon SageMaker SDK. For example, if you're using Python on the client side, use the Amazon SDK for Python (boto3). For an example of how to use boto3 to create a model, configure an endpoint, create an endpoint, and finally run inferences on the inference endpoint, refer to this example `Jupyter notebook on GitHub. <https://github.com/shashankprasanna/torchserve-examples/blob/master/deploy_torchserve.ipynb>`_


Metrics
~~~~~~~~

TorchServe supports both system level and model level metrics. You can use the TorchServe central metrics config file metrics.yaml to specify the types of metrics to be tracked, such as request counts, latency, memory usage, GPU utilization, and more. By referring to this file, you can gain insights into the performance and health of the deployed models and effectively monitor the TorchServe server's behavior in real-time. For more detailed information, see the `TorchServe metrics documentation <https://github.com/pytorch/serve/blob/master/docs/metrics.md#torchserve-metrics>`_. You can access TorchServe metrics logs that are similar to the StatsD format through the Amazon CloudWatch log filter. The following is an example of a TorchServe metrics log:

.. code:: shell

CPUUtilization.Percent:0.0|#Level:Host|#hostname:my_machine_name,timestamp:1682098185
DiskAvailable.Gigabytes:318.0416717529297|#Level:Host|#hostname:my_machine_name,timestamp:1682098185

Reference
~~~~~~~~~~

- `Deploying PyTorch models for inference at scale using TorchServe <https://aws.amazon.com/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/>`_
- `Deploy models with TorchServe <https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-torchserve.html>`_
- `Running TorchServe <https://pytorch.org/serve/server.html>`_