Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tutorial for using torchserve on aws sagemaker #2671

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Viditagarwal7479
Copy link
Contributor

@Viditagarwal7479 Viditagarwal7479 commented Nov 10, 2023

Fixes #2345

Added tutorial on how to use torchserve on AWS Sagemaker.
The tutorial focuses on features of AWS Sagemaker and other AWS services which we can use to serve PyTorch model rather than emphasizing various features provided by torchserve. Although I have provided external links to tutorials wherever we can do more things using more torchserve features like handler customization in torch-model-archiver.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @msaroufim @agunapal @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Copy link

pytorch-bot bot commented Nov 10, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2671

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Viditagarwal7479
Copy link
Contributor Author

@pytorchbot label "docathon-h2-2023"

| TorchServe is easy to use. It comes with a convenient CLI to deploy locally and is easy to package into a container and scale out with Amazon SageMaker or Amazon EKS. With default handlers for common problems such as image classification, object detection, image segmentation, and text classification, you can deploy with just a few lines of code—no more writing lengthy service handlers for initialization, preprocessing, and post-processing. TorchServe is open-source, which means it's fully open and extensible to fit your deployment needs.

To get started on how to use TorchServe you can refer to this tutorial: `TorchServe QuickStart <https://pytorch.org/serve/getting_started.html>`_

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SageMaker has 2 different endpoints. Their deployment is slightly different. Please include this information.

  • single model
  • multi-model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the user manual links of using TorchServe on SM:

* https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-torchserve.html

* https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html

Hello lxning,
Thanks for the review, I have already added these links in the reference at the end of the tutorial.

Comment on lines 95 to 99
#. Create a compressed tar.gz file out of the densenet161.mar file, because Amazon SageMaker expects models to be in a tar.gz file.

.. code:: shell

tar cvfz $model_file_name.tar.gz densenet161.mar
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You can skip this extra step by using torchserve-model-archiver --archive-format tgz.
  • For large model, we recommend using torchserve-model-archiver --archive-format no-archive by leveraging SM uncompressed model artifact feature (current ly only available on SageMaker single model endpoint) (see details: com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html)


aws s3 cp $model_file_name.tar.gz s3://{bucket_name}/{prefix}/model

Creating an Amazon ECR registry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is needed ONLY if you are going to BYOD or BYOC, otherwise it is not needed.

Comment on lines 217 to 220
Metrics
~~~~~~~~

TorchServe supports both system level and model level metrics. You can enable metrics in either log format mode or Prometheus mode through the environment variable TS_METRICS_MODE. You can use the TorchServe central metrics config file metrics.yaml to specify the types of metrics to be tracked, such as request counts, latency, memory usage, GPU utilization, and more. By referring to this file, you can gain insights into the performance and health of the deployed models and effectively monitor the TorchServe server's behavior in real-time. For more detailed information, see the `TorchServe metrics documentation <https://github.com/pytorch/serve/blob/master/docs/metrics.md#torchserve-metrics>`_. You can access TorchServe metrics logs that are similar to the StatsD format through the Amazon CloudWatch log filter. The following is an example of a TorchServe metrics log:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SageMaker does not support prometheus format. User can only use regex search TorchServe metrics log.

prometheus format not yet supported to view torchserve metric logs boto/boto3#3437
@Viditagarwal7479
Copy link
Contributor Author

Viditagarwal7479 commented Nov 13, 2023

Hello @svekars @sekyondaMeta, just a gentle reminder to review my PR kindly. Should I remove the docathon label from this PR to get it merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

💡 [REQUEST] - How to use TorchServe on AWS SageMaker
3 participants