Skip to content

Latest commit

 

History

History
168 lines (118 loc) · 11.8 KB

README.md

File metadata and controls

168 lines (118 loc) · 11.8 KB

Azure Arc-enabled Machine Learning - Training Public Preview

As part of Azure Machine Learning (AML) service capabilities, Azure Arc-enabled Machine Learning (ML) brings AML to any infrastructure across multi-cloud, on-premises, and the edge using Kubernetes on their hardware of choice. The design for Azure Arc-enabled ML helps IT operators leverage native Kubernetes concepts such as namespace, node selector, and resources requests/limits for ML compute utilization and optimization. By letting the IT operator manage ML compute setup, Azure Arc-enabled ML creates a seamless AML experience for data scientists who do not need to learn or use Kubernetes directly.

This repository is intended to serve as an information hub for customers and partners who are interested in Azure Arc-enabled AML training public preview. Use this repository for onboarding and testing instructions as well as an avenue to provide feedback, issues, enhancement requests and stay up to date as the preview progresses. To deploy a trained model using Azure Arc-enabled Machine Learning, please sign up Inference Preview. Please note that preview release is subject to the Supplemental Terms of Use for Microsoft Azure Previews

Prerequisites

  1. An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
  2. You have a Kubernetes cluster up and running - the cluster must have minimum of 4 vCPU cores and 8GB memory, around 2 vCPU cores and 3GB memory would be used by Arc and ML extension components.
  3. Your Kubernetes cluster is connected to Azure Arc (not a prerequisite for AKS in Azure cloud)
  4. You've met the pre-requisites listed under the generic cluster extensions documentation.
    • Azure CLI version >=2.24.0
    • Azure CLI extension k8s-extension version >=0.4.3.
  5. Create an AML workspace if you don't have one already.
    • AML Python SDK version >= 1.30.

Getting started

Getting started with Training Public Preview is easy with following steps:

Training Public Preview supported features

As another compute target in AML, Azure Arc-enabled ML preview supports the following built-in AML training features seamlessly:

In addition to above built-in AML training features, public preview also supports following on-premises training scenarios

Supported features in private preview

Region availability

Azure Arc-enabled Machine Learning is currently supported in these regions where Azure Arc is available:

  • East US
  • East US 2
  • South Central US
  • West US 2
  • Australia East
  • Southeast Asia
  • North Europe
  • UK South
  • West Europe
  • West Central US
  • Central US
  • North Central US
  • West US
  • Korea Central
  • France Central

Supported Kubernetes distributions and versions

Release notes

New features are released at a biweekly cadance.

July 2, 2021 Release

  • New Kubernetes distributions support, OpenShift Kubernetes and GKE (Google Kubernetes Engine).
  • Autoscale support. If the user-managed Kubernetes cluster enables the autoscale, the cluster will be automatically scaled out or scaled in according to the volume of active runs and deployments.
  • Performance improvement on job laucher, which shortens the job execution time to a great deal.

August 10, 2021 Release

August 24, 2021 Release

Sept 16, 2021 Release

  • New regions available, WestUS, CentralUS, NorthCentralUS, KoreaCentral.
  • Job queue explanability. See job queue details in AML Workspace Studio.
  • Auto-killing policy. Support max_run_duration_seconds in ScriptRunConfig. The system will attempt to automatically cancel the run if it took longer than the setting value.
  • Performance improvement on cluster autoscale support.
  • Arc agent and ML extension deployment from on-prem container registry

Oct 14, 2021 Release

Support

We are always looking for feedback on our current experiences and what we should work on next. If there is anything you would like us to prioritize, please feel free to suggest so via our GitHub Issue Tracker. You can submit a bug report, a feature suggestion or participate in discussions.

Or reach out to us: amlarc-pm@microsoft.com if you have any questions or feedback.

Activities

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Impressions

Disclaimer

The lifecycle management (health, kubernetes version upgrades, security updates to nodes, scaling, etc.) of the AKS or Arc Kubernetes cluster is the responsibility of the customer.

For AKS, read what is managed and what is shared responsibility here

All preview features are available on a self-service, opt-in basis and are subject to breaking design and API changes. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. As such, these features aren't meant for production use.

Azure Arc-enabled ML supports targeting ML training on both Azure Kubernetes Service (AKS) clusters or any cluster that is registered in Azure using Arc.

Kubernetes version support is in accordance with what AKS supports, see here for details.