Skip to content

Petopp/Udacity_Project_2

Repository files navigation

Udacity Machine Learning Project 2 for Machine Learning Engineer with Microsoft Azure.

In this project we focused more to the Endpoints and SDK in Azure Machine Learning. To do this, we used the MS Azure Machine Learning Studio to run a AutoML-algorithm, which fits to the already used Bank Marketing Dataset from Project 1. Furthermore, we made it ready for production and deployed it using ACI (Azure Container Instance). The trained model is consumend via REST-API (http-request). Also a pipline was build and consumed.

Summary of the procedure

  1. Download the CSV file of Bank Marketing Data. (You can find the file in this project folder)

  2. Uploaded this file to Azure Machine Learning in to a dataset.

    -In the AutoML functionality, we have defined that the problem we want to tackle is a classification. We have also defined that the variable y (which refers to the decision whether a customer is eligible or not) is our target variable (binary variable).

  3. Then a compute-cluster was generated ("Standard-DS12_v2").

  4. Through Azure Auto ML functionality, we found the best model and deployed it with ACI (the algorithm with the name "voting ensemble" was the best in this test)

  5. Logging and Appplication Insights have been enabled to provide information about the requirements and performance related to the model in use.

  6. The Rest endpoint was testet for connectivity (over Swagger)

  7. Finally step, we used the python SDK to generate a pipleine and published it.

Key Steps

1. Loading the CSV to the Dataset

The download path was communicated at the beginning of the project. This was then downloaded and imported into the dataset. The steps are identical to those carried out in project 1.

Here can you see, the result of loading the files in the Dataset

image

and here can you see the confirmation from Azure

image

2. AutoML - setup:

In the next steps we are defining the compute cluster and starting the experiment

Start and configuration the experiment:

image

image

AUTO ML-experiment is now completed:

image

image

Best model is VotingEnsemble with Accuracy of 0.92018 in this experiment.

image

Here you can see the calibration curve of this model. The calibration curve represents the confidence of a model in terms of its predictions compared to the proportion of positive samples at the respective confidence levels. Here can you find more information over this.

image

As well as an overview of the results of other algorithms.

image

2. Deployment of best model

Here you can see that the best model from AutoML has been selected and the authenficartion has been activated. The computer type was also set to ACI as required.

image

Here is the status that the model now has

image

3. Enable logging

As seen above, I chose the best model (VotingEnseble) for deployment and enable "Authentication" as well as the computer tpye Azure Container Instance (ACI). The code executed here in logs.py enables "Application Insights". "Application Insights enabled" was disabled before logs.py was executed.

Here the result in the Python Shell

image

and in Azure

image

4. Swagger & endpoint consumption

We also tested the API with Swagger using sample data. Swagger is also a very practical tool to easily test REST APIs. Azure provides a swagger.json file to easily test the provided models API and to address the trained models via a program.

This is the web interface of swagger, with "connect" to the JSON file from the model

image

and here is a example for communication with the model over JSON

image

we have also testet the endpoint by running the endpoint.py Python-file.

The result of the test can be seen as the output of the Python-Script below:

image

The feedback corresponds to what was specified as the target parameters. This experiment has thus been successfully tested and verified.

5. Overview over the Pipeline

Pipelines - general view

image

Pipeline - Endpoints

image

Pipeline - REST-endpoint image

6. Runnig the SDK over Jupyter

After upload the Jupyter programm in to Azure, can you find this on this file here:

image

Running the experiment over SDK in a Juypter Notebook.

image

image

image

Message that it is finished the experiment over the SDK

image

Aceess to the REST Endpoint over Jupyter:

image

Or you can see the url to the REST API here together with the published pipeline:

image

And a final test over python

image

with the best algorithm

image

Recording

YouTube

Suggestions for improvement for future experiments

  1. I will use by repating this a longer computation period/time frame to get higher accuracy and give the AutoML arlgorithms more time to fine-tune.

  2. Also enable Deep Learning functionality to try NN-based algorithms (requires GPU-capable computational resources). This could yield better results, provided that the amount of data can be increased as listed in point 3 in a moment.

  3. I would try to get a larger data set, possibly from other regions/countries. Since data from only one specific region could also bias the algorithm if it were to be used elsewhere.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published