Skip to content

Commit

Permalink
add feature for azure terraform (#1496) (#1102)
Browse files Browse the repository at this point in the history
* fix suggestion for azure terraform (#1496)

fix suggestion for azure terraform (#1496)

Signed-off-by: cclhsu <clark.hsu@suse.com>

* updaate formatting

Signed-off-by: cclhsu <clark.hsu@suse.com>

* updaate formatting

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Fix etcd join failed

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Remove public ip for worker node

Signed-off-by: cclhsu <clark.hsu@suse.com>

* add multiple zone README

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Apply suggestion

* Apply suggestion for azure_location

* Apply suggestion for dnsdomain

* Apply suggestion

* apply suggestion for variables.tf

* Apply description for password description

* Update terraform.tfvars.example

* Update variables.tf

* fix example and fmt

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Update README and add container-openrc.sh

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Fix typo

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Commit suggestion for README

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Update available zone example

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Update suggestion from peering review

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Update ci/infra/azure/terraform.tfvars.example

Co-authored-by: c3y1huang <chin-ya.huang@suse.com>

* Update ci/infra/azure/registration.auto.tfvars

Co-authored-by: c3y1huang <chin-ya.huang@suse.com>

* Update suggestion from peer review

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Update suggestion from peer review

Signed-off-by: cclhsu <clark.hsu@suse.com>

* Update suggestion

Signed-off-by: cclhsu <clark.hsu@suse.com>

* remove test code

Signed-off-by: cclhsu <clark.hsu@suse.com>

* update NTP server

Signed-off-by: cclhsu <clark.hsu@suse.com>

Co-authored-by: c3y1huang <chin-ya.huang@suse.com>
  • Loading branch information
Clark Hsu and c3y1huang committed Jun 18, 2020
1 parent 361b8bb commit ff7b032
Show file tree
Hide file tree
Showing 24 changed files with 1,073 additions and 0 deletions.
159 changes: 159 additions & 0 deletions ci/infra/azure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Introduction

This terraform project creates the infrastructure needed to run a
cluster on top of Azure.

Once the infrastructure is up and running nothing special has to be done
to deploy CaaS Platform on top of it.

This document focuses on the key aspects of the infrastructure created
by terraform.

# Cluster layout

## Setup service principal for terraform login credential

Following the guide [Creating a Service Principal in the Azure Portal](https://www.terraform.io/docs/providers/azurerm/guides/service_principal_client_secret.html#creating-a-service-principal-in-the-azure-portal), and set up `ARM_CLIENT_ID`, `ARM_CLIENT_SECRET`, `ARM_SUBSCRIPTION_ID`, `and ARM_TENANT_ID` in `container-openrc.sh`. Source `container-openrc.sh` before deploying terraform script.

## Machines

As usual the cluster is based on two types of nodes: master and worker nodes.

All the nodes are created using the SLES 15 SP1 container host image built
and maintained by the SUSE Public Cloud team.

Right now users **must** bring their own license into the public cloud to be
able to access SLE and SUSE CaaS Platform packages.

The machines are automatically registered at boot time against SUSE Customer
Center, RMT or a SUSE Manager instance depending on which one of the following
variables has been set:

* `caasp_registry_code`
* `rmt_server_name`
* `suma_server_name`

The SLES images [do not yet support cloud-init](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/using-cloud-init)
(it will probably be supported starting from SLE15 SP2). In the meantime
terraform leverages the Azure Linux Extension capabilities provided by
the [Azure Linux Agent](https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/agent-linux).

### Using spot instances

It's possible to create a [spot VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/spot-vms)
both for the master and worker nodes.

This can be done by setting these variables to `true` (they are set to
`false` by default):

* `master_use_spot_instance`
* `worker_use_spot_instance`

## Network layout

All of the infrastructure is created inside of a user specified AZURE region.
The resources are currently all located inside of the user specified availability
zones. All the nodes are placed inside of the same virtual network, within the same
subnet.

Worker nodes are never exposed to the public internet. On the opposite
each master nodes has a public IP address by default. This allows users to
connect to them via ssh from their computers.

It's also possible disable this behaviour and make **all** the nodes private.
This can be done setting the `create_bastionhost` variable to `true`.

When this variable is set all the master nodes cease to have a public IP address.
An [Azure Bastion](https://docs.microsoft.com/en-us/azure/bastion/bastion-overview)
instance is created which becomes the only way to ssh into the cluster.

Terraform creates also an internal DNS zone with the domain specified via the
`dnsdomain` variable. This allows all the nodes to reach each other using
their FQDN.

### Security groups

Terraform automatically creates security groups for the master and worker nodes
that are going to allow connections only to the allowed services. The security
rules are a 1:1 mapping of what we describe inside of SUSE CaaS Platform
documentation.

## Load balancer

Terraform automatically creates a load balancer with a public IP that exposes
the following services running on the control plane nodes:

* kubernetes API server: port 6443
* dex: port 32000
* gangway: port 32001

This is exactly the same behaviour used by other deployment platforms.

## Accessing the nodes

A default `sles` user is created on each node of the cluster. The user has
administrator capabilities by using the `sudo` utility.

By default password based authentication is disabled. It's possible to log
using the ssh key specified via the `admin_ssh_key` variable.
It's also possible to enable password based authentication by specifying a
value for the `admin_password` variable. Note well: Azure has some security
checks in place to avoid the usage of weak passwords.

When the bastion host creation is disabled the access to the master nodes of
the cluster is just a matter of doing a ssh against their public IP address.

Accessing a cluster through an Azure Bastion requires a different procedure.

### Using Azure Bastion

Azure bastion supports only RSA keys with PEM format. These can be created by
doing:

```
ssh-keygen -f azure -t rsa -b 4096 -m pem
```

This will create a public and private key named `azure`. The **private** key has
to be provided later to Azure, hence it's strongly recommended to create
a dedicate key pair.

Once the whole infrastructure is created you can connect into any node of the
cluster by doing the following steps:

1. Log into Azure portal
2. Choose one of the nodes of the cluster
3. Click "connect" and select "bastion" as option
4. Enter all the required fields

Once this is done a new browser tab will be open with a shell session running
inside of the desired node.

It's recommended to use Chrome or Chromium during this process.

You can ssh into the first bootstrapped master node to download the kubeconfig
file to operate the cluster without having to go through the bastion host.

Caveats of Azure Bastion:

* As of June 2020, the [Azure Bastion service](https://docs.microsoft.com/en-us/azure/bastion/bastion-overview#regions) is not available in all Azure regions.
* By design it's not possible to leverage the bastion host without using the
ssh session embedded into the browser. This makes impossible to use tools like
`sftp` or `scp`.
* You have to rely on copy and paste to share data (like the `admin.conf` file
generated by skuba) between the remote nodes and your local system.
You can "rely" on `cat`, `base64` and a lot of copy and paste...
* `skuba` requires a private ssh key to connect to all the nodes of the cluster.
You have to upload the private key you specified at cluster creation
or create a new one inside of the first master node and copy that
around the cluster.

## Virtual Network Peering Support

It is possible to join existing network to the cluster. It can be setup by adding a list of network id to `peer_virutal_network_id`.

## Enable Multiple Zones

It is possible to enable multiple zone. It can be set `enable_zone` to `true` and master/worker node will distribute sequentially based on zones defined in `azure_availability_zones`.

* As of June 2020, the [Azure Availability Zones](https://docs.microsoft.com/en-us/azure/availability-zones/az-region) is not available in all Azure regions.
12 changes: 12 additions & 0 deletions ci/infra/azure/bastion-instance.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
resource "azurerm_bastion_host" "bastionhost" {
count = var.create_bastionhost ? 1 : 0
name = "${var.stack_name}-bastion"
location = azurerm_resource_group.resource_group.location
resource_group_name = azurerm_resource_group.resource_group.name

ip_configuration {
name = "${var.stack_name}-configuration"
subnet_id = azurerm_subnet.bastionhost.0.id
public_ip_address_id = azurerm_public_ip.bastionhost.0.id
}
}
1 change: 1 addition & 0 deletions ci/infra/azure/cloud-init/commands.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
zypper -n install ${packages}
10 changes: 10 additions & 0 deletions ci/infra/azure/cloud-init/init.sh.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash
set -e

${ntp_servers}
${name_servers}
${register_scc}
${register_rmt}
${register_suma}
${repositories}
${commands}
6 changes: 6 additions & 0 deletions ci/infra/azure/cloud-init/nameserver.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# /etc/sysconfig/network/config
IFS=' ' read -r -a servers <<< "${name_servers}"
for server in $${servers[*]}
do
echo nameserver $${server} | sudo tee -a /etc/resolv.conf
done
6 changes: 6 additions & 0 deletions ci/infra/azure/cloud-init/ntp.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sudo rm -f /etc/chrony.d/ntp_customized.conf
IFS=' ' read -r -a servers <<< "${ntp_servers}"
for server in $${servers[*]}
do
echo "server $${server} iburst" | sudo tee -a /etc/chrony.d/ntp_customized.conf
done
4 changes: 4 additions & 0 deletions ci/infra/azure/cloud-init/register-rmt.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
curl --tlsv1.2 --silent --insecure --connect-timeout 10 https://${rmt_server_name}/rmt.crt --output /etc/pki/trust/anchors/rmt-server.pem && /usr/sbin/update-ca-certificates &> /dev/null
SUSEConnect --url https://${rmt_server_name}
SUSEConnect -p sle-module-containers/15.1/x86_64
SUSEConnect -p caasp/4.0/x86_64
3 changes: 3 additions & 0 deletions ci/infra/azure/cloud-init/register-scc.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SUSEConnect --url https://scc.suse.com -r ${caasp_registry_code}
SUSEConnect -p sle-module-containers/15.1/x86_64
SUSEConnect -p caasp/4.0/x86_64 -r ${caasp_registry_code}
3 changes: 3 additions & 0 deletions ci/infra/azure/cloud-init/register-suma.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
curl --tlsv1.2 --insecure --connect-timeout 10 https://${suma_server_name}/pub/bootstrap/bootstrap.sh --output /tmp/bootstrap.sh
chmod +x /tmp/bootstrap.sh
sh /tmp/bootstrap.sh
1 change: 1 addition & 0 deletions ci/infra/azure/cloud-init/repository.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
zypper ar -f ${repository_url} ${repository_name}
9 changes: 9 additions & 0 deletions ci/infra/azure/container-openrc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
# - [How to: Use the portal to create an Azure AD application and service principal that can access resources](https://docs.microsoft.com/en-US/azure/active-directory/develop/howto-create-service-principal-portal)
# - [Azure Provider: Authenticating using a Service Principal with a Client Secret](https://www.terraform.io/docs/providers/azurerm/guides/service_principal_client_secret.html)
# - [Creating a Service Principal in the Azure Portal](https://www.terraform.io/docs/providers/azurerm/guides/service_principal_client_secret.html#creating-a-service-principal-in-the-azure-portal)

# export ARM_CLIENT_ID=00000000-0000-0000-0000-000000000000
# export ARM_CLIENT_SECRET=00000000-0000-0000-0000-000000000000
# export ARM_SUBSCRIPTION_ID=00000000-0000-0000-0000-000000000000
# export ARM_TENANT_ID=00000000-0000-0000-0000-000000000000
78 changes: 78 additions & 0 deletions ci/infra/azure/init.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
data "template_file" "register_rmt" {
template = file("${path.module}/cloud-init/register-rmt.tpl")
count = var.rmt_server_name == "" ? 0 : 1

vars = {
rmt_server_name = var.rmt_server_name
}
}

data "template_file" "register_scc" {
# register with SCC iff an RMT has not been provided
count = var.caasp_registry_code != "" && var.rmt_server_name == "" ? 1 : 0
template = file("${path.module}/cloud-init/register-scc.tpl")

vars = {
caasp_registry_code = var.caasp_registry_code
}
}

data "template_file" "register_suma" {
template = file("${path.module}/cloud-init/register-suma.tpl")
count = var.suma_server_name == "" ? 0 : 1

vars = {
suma_server_name = var.suma_server_name
}
}

data "template_file" "repositories" {
count = length(var.repositories)
template = file("${path.module}/cloud-init/repository.tpl")

vars = {
repository_url = element(values(var.repositories), count.index)
repository_name = element(keys(var.repositories), count.index)
}
}

data "template_file" "ntp_servers" {
count = length(var.ntp_servers) == 0 ? 0 : 1
template = file("${path.module}/cloud-init/ntp.tpl")

vars = {
ntp_servers = join(" ", var.ntp_servers)
}
}

data "template_file" "dns_nameservers" {
count = length(var.dns_nameservers) == 0 ? 0 : 1
template = file("${path.module}/cloud-init/nameserver.tpl")

vars = {
name_servers = join(" ", var.dns_nameservers)
}
}

data "template_file" "commands" {
count = length(var.packages) == 0 ? 0 : 1
template = file("${path.module}/cloud-init/commands.tpl")

vars = {
packages = join(", ", var.packages)
}
}

data "template_file" "cloud-init" {
template = file("${path.module}/cloud-init/init.sh.tpl")

vars = {
commands = join("\n", data.template_file.commands.*.rendered)
ntp_servers = join("\n", data.template_file.ntp_servers.*.rendered)
name_servers = join("\n", data.template_file.dns_nameservers.*.rendered)
repositories = length(var.repositories) == 0 ? "\n" : join("\n", data.template_file.repositories.*.rendered)
register_scc = var.caasp_registry_code != "" && var.rmt_server_name == "" ? join("\n", data.template_file.register_scc.*.rendered) : ""
register_rmt = var.rmt_server_name != "" ? join("\n", data.template_file.register_rmt.*.rendered) : ""
register_suma = var.suma_server_name != "" ? join("\n", data.template_file.register_suma.*.rendered) : ""
}
}
81 changes: 81 additions & 0 deletions ci/infra/azure/load-balancer.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
resource "azurerm_lb" "lb" {
name = "${var.stack_name}-lb"
location = azurerm_resource_group.resource_group.location
resource_group_name = azurerm_resource_group.resource_group.name
sku = "standard"

frontend_ip_configuration {
name = "PublicIPAddress"
public_ip_address_id = azurerm_public_ip.lb.id
}
}

resource "azurerm_lb_backend_address_pool" "masters" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "master-nodes"
}

resource "azurerm_lb_probe" "kube_api" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "kube-apisever-running-probe"
port = 6443
}

resource "azurerm_lb_rule" "kube_api" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "kube-api-server"
protocol = "Tcp"
frontend_port = 6443
backend_port = 6443
frontend_ip_configuration_name = azurerm_lb.lb.frontend_ip_configuration[0].name
probe_id = azurerm_lb_probe.kube_api.id
backend_address_pool_id = azurerm_lb_backend_address_pool.masters.id
}

resource "azurerm_lb_probe" "kube_dex" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "kube-dex-running-probe"
port = 32000
}

resource "azurerm_lb_rule" "kube_dex" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "kube-dex"
protocol = "Tcp"
frontend_port = 32000
backend_port = 32000
frontend_ip_configuration_name = azurerm_lb.lb.frontend_ip_configuration[0].name
probe_id = azurerm_lb_probe.kube_dex.id
backend_address_pool_id = azurerm_lb_backend_address_pool.masters.id
}

resource "azurerm_lb_probe" "kube_gangway" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "kube-gangway-running-probe"
port = 32001
}

resource "azurerm_lb_rule" "kube_gangway" {
resource_group_name = azurerm_resource_group.resource_group.name
loadbalancer_id = azurerm_lb.lb.id
name = "kube-gangway"
protocol = "Tcp"
frontend_port = 32001
backend_port = 32001
frontend_ip_configuration_name = azurerm_lb.lb.frontend_ip_configuration[0].name
probe_id = azurerm_lb_probe.kube_gangway.id
backend_address_pool_id = azurerm_lb_backend_address_pool.masters.id
}

resource "azurerm_network_interface_backend_address_pool_association" "kube_api" {
count = var.masters
backend_address_pool_id = azurerm_lb_backend_address_pool.masters.id
ip_configuration_name = "internal"
network_interface_id = element(azurerm_network_interface.master.*.id, count.index)
}
Loading

0 comments on commit ff7b032

Please sign in to comment.