Azure Machine Learning workspace security features, deployed to a managed VNET, with private datastores via private endpoints, as well as an optional workspace private endpoint.
Copy the template .auto.tfvars
configuration file:
Important
A public workspace has some limitations when connecting to private resources, which need extra configuration when using private datastores.
cp config/template.tfvars .auto.tfvars
Set the allowed_ip_address
to allow connectivity to Azure.
(Optional) If using a public compute, generate an SSH key pair to be used for connection:
mkdir keys
ssh-keygen -f keys/ssh_key
Apply the resources:
terraform init
terraform apply -auto-approve
Once all resources are created run the step 2.
For a managed VNET setup, there are three isolation modes:
Isolation Mode | Terraform value |
---|---|
Disabled | Disabled |
Allow Internet Outbound | AllowInternetOutbound |
Allow Only Approved Outbound | AllowOnlyApprovedOutbound |
Important
Using AllowOnlyApprovedOutbound
will create an Azure Firewall with Standard SKU, which adds significant costs.
The workspace will be created with AllowInternetOutbound
. Configure the outbound access in the managed VNET using a preferred interface (add the data lake and the SQL database), which will enable secure outbound access via private endpoints.
Important
A Container Registry with Premium
SKU is required for private endpoints.
Note
Due to limitations with the Terraform provider, compute commands will proceed using the Azure CLI.
Tip
By default, the managed VNET is created along with the compute. Private endpoints should be active after or available for approval.
az ml workspace provision-network -g rg-litware -n <my_workspace_name> --include-spark
Create the compute instance. This is using CLI V2 notation:
Warning
Current YAML notation does not allow configuring public access, with the exception of SSH. Make sure to use --enable-node-public-ip false
for increased security as the default is true
.
az ml compute create \
--file compute.yml \
--resource-group my-resource-group \
--workspace-name my-workspace \
--enable-node-public-ip false
To complete the process via Terraform, a private endpoint must be manually approved when the compute is created. I assume this endpoint is required to enable the instances to communicate with the workspace.
Important
The execution will halt until the manual approval is done, so keep watching for when the approval is requested.
Once all resources are created, the data stores must be registered in the outbound rules section in order to use them securely via private connections.
To use the CLI, install the ML extension:
az extension add -n ml
And then create the rules:
az ml workspace outbound-rule set -g rg-litware \
--rule datalake \
--type private_endpoint \
--workspace-name mlw-litware<abc123> \
--service-resource-id <ID> \
--spark-enabled true \
--subresource-target dfs
Or, it might be easier via the Portal integration:
It might be required to perform manual private endpoint approvals, such as in this example for the SQL Server:
It's time to connect the data sources to the AML workspace. These connections should happen via private endpoints. Datastore connection is documented, such as in this page or this article.
👉 Create a secret for the pre-create Application Registration in Entra ID that can be used to setup connections to the data lake. Optionally, it can also be used for the SQL Server, but it will require an external authentication setup which is not covered here - SQL authentication should be enough for this demo.
👉 Register the datastore in order to securely and productively connect to data resources.
Once the datastores are registered, they become usable via notebooks. In the next example a file is downloaded from a Blob storage.
import os
import azureml.core
from azureml.core import Workspace, Datastore, Dataset
ws = Workspace.from_config()
datastore = Datastore.get(ws, datastore_name='blobs')
datastore.download(target_path="./output", prefix="contacts.csv", overwrite=False)
arr = os.listdir('./output')
print(arr)
file = open("./output/contacts.csv", "r").read()
print(file)
Alternatively, prefer using SDK v2 for workspace and data operations.
Create the login from an external provider:
USE master
CREATE LOGIN [datastores-litware123] FROM EXTERNAL PROVIDER
GO
Check the server login:
SELECT name, type_desc, type, is_disabled
FROM sys.server_principals
WHERE type_desc like 'external%'
Create the database user associated with the external login:
CREATE USER [datastores-litware123] FROM LOGIN [datastores-litware123]
GO
Check the database user:
SELECT name, type_desc, type
FROM sys.database_principals
WHERE type_desc like 'external%'
Add the necessary permissions to the user:
ALTER ROLE db_datareader ADD MEMBER [datastores-litware123];
ALTER ROLE db_datawriter ADD MEMBER [datastores-litware123];
ALTER ROLE db_ddladmin ADD MEMBER [datastores-litware123];
GO
The most secure architecture for AML would be a private AML workspace, meaning that the workspace would be accessible only via a private endpoint, and the dependent resources and data stores also accessible via private connections.
To enable private access for this project, change these variables as follows:
mlw_public_network_access_enabled = false
mlw_create_private_endpoint_flag = true
vm_create_flag = true
Then apply
the configuration. Once applied, access tot he AML workspace should be possible only using the VM.
Service endpoints should be already created for the datastores, so next step would be to disable public access to storages and databases and make the architecture 100% private.
AML components resources should also be set to private if possible. For example, the workspace storage needs to be visible to the users in the private network, but not from the internet in this use case.
As per Microsoft documentation, a Firewall with Standard
SKU will be created and the respective cost increase will apply.
FQDN outbound rules are implemented using Azure Firewall. If you use outbound FQDN rules, charges for Azure Firewall are added to your billing. The Azure Firewall (standard SKU) is provisioned by Azure Machine Learning.
To avoid this, one option is using a customer-managed VNET which is also the recommended option.
There are some limitations when using a public access which will need some special configuration. I've opened this thread in which I'm asking Msft to add further details on which combinations are actually validar or invalid, and what additional configuration is required.
I've also ran into this issue where creating a Dataset
is not working.
It is important to notice that users of the Azure ML Studio must have line of sight of the workspace storage. Here is an example with the console traces demonstrating that:
Delete the resources and avoid unplanned costs:
terraform destroy -auto-approve