Udacity Capstone Project: Automl & HyperDrive Experiment

The current project uses machine learning to predict patients’ survival based on their medical data.

I create two models in the environment of Azure Machine Learning Studio: one using AutoML and one customized model whose hyperparameters are tuned using HyperDrive, then compare the performance of both models and deploy the best performing model as a service using Azure Container Instances (ACI).

Project Set Up and Installation

I ussed the provided workspace and environment, so everything was pre-installed by Udacity course. Following scripts were used in this project:

automl.ipynb: for the AutoML experiment
hyperparameter_tuning.ipynb: for the HyperDrive experiment
heart_failure_clinical_records_dataset.csv: the dataset file taken from Kaggle
train.py: a basic script for manipulating the data used in the HyperDrive experiment; modified script given in first project
scoring_file_v_1_0_0.py: the script used to deploy the model which is downloaded from within Azure Machine Learning Studio
env.yml: the environment file which is also downloaded from within Azure Machine Learning Studio

Dataset

Overview

The dataset used is taken from Kaggle and -as we can read in the original Research article- the data comes from 299 patients with heart failure collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients consisted of 105 women and 194 men, and their ages range between 40 and 95 years old.

The dataset contains 13 features:

Feature	Explanation	Measurement
age	Age of patient	Years (40-95)
anaemia	Decrease of red blood cells or hemoglobin	Boolean (0=No, 1=Yes)
creatinine-phosphokinase	Level of the CPK enzyme in the blood	mcg/L
diabetes	Whether the patient has diabetes or not	Boolean (0=No, 1=Yes)
ejection_fraction	Percentage of blood leaving the heart at each contraction	Percentage
high_blood_pressure	Whether the patient has hypertension or not	Boolean (0=No, 1=Yes)
platelets	Platelets in the blood	kiloplatelets/mL
serum_creatinine	Level of creatinine in the blood	mg/dL
serum_sodium	Level of sodium in the blood	mEq/L
sex	Female (F) or Male (M)	Binary (0=F, 1=M)
smoking	Whether the patient smokes or not	Boolean (0=No, 1=Yes)
time	Follow-up period	Days
DEATH_EVENT	Whether the patient died during the follow-up period	Boolean (0=No, 1=Yes)

Task

The task was to classify patients based on their odd of survival, the prediction is based on features included in above table.

Access

I uploaded the data on azure ml studio, also it was available on my github repository and provided the link in notebook.

Automated ML

Below you can see an overview of the automl settings and configuration I used for the AutoML run:

"n_cross_validations": 2

This parameter sets how many cross validations to perform, based on the same number of folds (number of subsets). As one cross-validation could result in overfit, in my code I chose 2 folds for cross-validation; thus the metrics are calculated with the average of the 2 validation metrics.

"primary_metric": 'accuracy'

I chose accuracy as the primary metric as it is the default metric used for classification tasks.

"enable_early_stopping": True

It defines to enable early termination if the score is not improving in the short term. In this experiment, it could also be omitted because the experiment_timeout_minutes is already defined below.

"max_concurrent_iterations": 4

It represents the maximum number of iterations that would be executed in parallel.

"experiment_timeout_minutes": 20

This is an exit criterion and is used to define how long, in minutes, the experiment should continue to run. To help avoid experiment time out failures, I used the value of 20 minutes.

"verbosity": logging.INFO

The verbosity level for writing to the log file.

compute_target = compute_target

The Azure Machine Learning compute target to run the Automated Machine Learning experiment on.

task = 'classification'

This defines the experiment type which in this case is classification. Other options are regression and forecasting.

training_data = dataset

The training data to be used within the experiment. It should contain both training features and a label column - see next parameter.

label_column_name = 'DEATH_EVENT'

The name of the label column i.e. the target column based on which the prediction is done.

path = project_folder

The full path to the Azure Machine Learning project folder.

featurization = 'auto'

This parameter defines whether featurization step should be done automatically as in this case (auto) or not (off).

debug_log = 'automl_errors.log

The log file to write debug information to.

Results

Model Run Widget

Metrics

Best Performance Model

Hyperparameter Tuning

Parameter sampler

I specified the parameter sampler as such:

ps = RandomParameterSampling(
    {
        '--C' : choice(0.001,0.01,0.1,1,10,20,50,100,200,500,1000),
        '--max_iter': choice(50,100,200,300)
    }
)

I chose discrete values with choice for both parameters, C and max_iter.

C is the Regularization while max_iter is the maximum number of iterations.

RandomParameterSampling is one of the choices available for the sampler and I chose it because it is the faster and supports early termination of low-performance runs. If budget is not an issue, we could use GridParameterSampling to exhaustively search over the search space or BayesianParameterSampling to explore the hyperparameter space.

Early stopping policy

An early stopping policy is used to automatically terminate poorly performing runs thus improving computational efficiency. I chose the BanditPolicy which I specified as follows:

policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

Two hyperparameters tunned in this model

Run Widget

Results

Model with different Hyperparameter tunning and Metrics

Register Model with RunID

Model Deployment

The deployment is done following the steps below:

Preparation of an inference configuration
Preparation of an entry script
Choosing a compute target
Deployment of the model
Testing the resulting web service

Inference configuration

The inference configuration defines the environment used to run the deployed model. The inference configuration includes two entities, which are used to run the model when it's deployed.

Entry script

The entry script is the scoring.py file. The entry script loads the model when the deployed service starts and it is also responsible for receiving data, passing it to the model, and then returning a response.

Compute target

As compute target, I chose the Azure Container Instances (ACI) service, which is used for low-scale CPU-based workloads that require less than 48 GB of RAM.

The ACI Webservice Class represents a machine learning model deployed as a web service endpoint on Azure Container Instances. The deployed service is created from the model, script, and associated files, as I explain above. The resulting web service is a load-balanced, HTTP endpoint with a REST API. We can send data to this API and receive the prediction returned by the model.

Serive State of Deployed Model

Testing the resulting web service

Screen Recording

The screen recording can be found here and it shows the project in demonstration which include:

A working model
Demo of the deployed model
Demo of a sample request sent to the endpoint and its response

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
starter_file		starter_file
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Capstone Project: Automl & HyperDrive Experiment

Project Set Up and Installation

Dataset

Overview

Task

Access

Automated ML

Results

Hyperparameter Tuning

Results

Model Deployment

Inference configuration

Entry script

Compute target

Screen Recording

About

Releases

Packages

Languages

hmza09/nd00333-capstone

Folders and files

Latest commit

History

Repository files navigation

Udacity Capstone Project: Automl & HyperDrive Experiment

Project Set Up and Installation

Dataset

Overview

Task

Access

Automated ML

Results

Hyperparameter Tuning

Results

Model Deployment

Inference configuration

Entry script

Compute target

Screen Recording

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages