Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

Allow specifying Cluster Domain for clientURLs #2082

Merged
merged 1 commit into from
May 23, 2019
Merged

Allow specifying Cluster Domain for clientURLs #2082

merged 1 commit into from
May 23, 2019

Conversation

mikkeloscar
Copy link
Contributor

This allows setting the cluster domain as a flag on the operator. The
cluster domain is used as a suffix in the Client URLs for the etcd
members.

The ability to set a custom cluster domain is desirable when running in
clusters with a custom DNS configuration.

In our case we need this because we have decided to default the DNS config to use ndots:2 instead of the default ndots:5 setting which can result in up to 10 additional DNS queries for every single name lookup, which causes an unnecessary overhead for the DNS infrastructure and for applications.

Ref: kubernetes/dns#159 (comment)

An alternative way to fix this would be to allow specifying the DNS config of the pod spec: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-config

@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

2 similar comments
@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@hasbro17
Copy link
Contributor

hasbro17 commented May 6, 2019

@etcd-bot ok to test

@mikkeloscar
Copy link
Contributor Author

Any update on this?

@hasbro17
Copy link
Contributor

hasbro17 commented May 9, 2019

@mikkeloscar Sorry for the delay on this. Appending the ClusterDomain seems okay to me, however I'm wondering how this is fixed via the DNS config?

Also I'm considering if we should pass in the ClusterDomain suffix as a flag to the operator, vs specifying it as a field in the PodPolicy of the EtcdCluster CR i.e spec.pod.clusterDomainSuffix?

Currently the restore-operator also creates a new seed member for an existing EtcdCluster. And with this change it won't be setting the member's ClusterDomain suffix:
https://github.com/coreos/etcd-operator/blob/master/pkg/controller/restore-operator/sync.go#L214-L223

So instead of passing in another flag to the restore-operator we can set it once in the EtcdCluster PodPolicy and use it across the two operators.
WDYT?

@mikkeloscar
Copy link
Contributor Author

@hasbro17 thanks for the feedback

@mikkeloscar Sorry for the delay on this. Appending the ClusterDomain seems okay to me, however I'm wondering how this is fixed via the DNS config?

In our setup we default to ndots:2 but if the user specifies another DNS config on the pod then we respect that, so you would be able to "opt-out" by specifying:

dnsConfig:
    options:
      - name: ndots
        value: "5"

Obviously you wouldn't benefit from reducing the number of useless DNS queries, which is why I think setting the suffix is better.

So instead of passing in another flag to the restore-operator we can set it once in the EtcdCluster PodPolicy and use it across the two operators.
WDYT?

My motivation for setting this as a flag is that you would usually want to have the same clusterDomain for all etcd clusters within a single Kubernetes cluster, so it's simpler to set it centrally rather than configure it for each etcd cluster.

If you prefer the podPolicy solution then I could add it there instead. Otherwise I would add the flag to the restore-operator as well.

@hasbro17
Copy link
Contributor

hasbro17 commented May 9, 2019

@mikkeloscar Thanks for the explanation.

I would still lean towards setting the ClusterDomain in the EtcdCluster's PodPolicy just so the restore operator can reuse the existing EtcdCluster's setting.
Plus making it part of the API would more clearly document this setting for each cluster.

PS can you also please add a line to the CHANGELOG to mention the new API.

@mikkeloscar
Copy link
Contributor Author

@hasbro17 ok, I changed it now to use a ClusterDomain field on the PodPolicy.

Namespace: c.cluster.Namespace,
SecurePeer: c.isSecurePeer(),
SecureClient: c.isSecureClient(),
ClusterDomain: c.cluster.Spec.Pod.ClusterDomain,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if c.cluster.Spec.Pod is nil before setting this field. Otherwise we get a segfault:

time="2019-05-09T20:39:40Z" level=info msg="etcd-operator Version: 0.9.4+git"
time="2019-05-09T20:39:40Z" level=info msg="Git SHA: 8d9b1e5d"
time="2019-05-09T20:39:40Z" level=info msg="Go Version: go1.10"
time="2019-05-09T20:39:40Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-05-09T20:39:40Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"e2e-etcd-operator-e2e-pr-3371\", Name:\"etcd-operator\", UID:\"91df3254-729a-11e9-8ce3-42010a800271\", APIVersion:\"v1\", ResourceVersion:\"111408674\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator became leader"
time="2019-05-09T20:39:45Z" level=info msg="creating cluster with Spec:" cluster-name=test-etcd-lntbc cluster-namespace=e2e-etcd-operator-e2e-pr-3371 pkg=cluster
time="2019-05-09T20:39:45Z" level=info msg="{" cluster-name=test-etcd-lntbc cluster-namespace=e2e-etcd-operator-e2e-pr-3371 pkg=cluster
time="2019-05-09T20:39:45Z" level=info msg="    \"size\": 3," cluster-name=test-etcd-lntbc cluster-namespace=e2e-etcd-operator-e2e-pr-3371 pkg=cluster
time="2019-05-09T20:39:45Z" level=info msg="    \"repository\": \"quay.io/coreos/etcd\"," cluster-name=test-etcd-lntbc cluster-namespace=e2e-etcd-operator-e2e-pr-3371 pkg=cluster
time="2019-05-09T20:39:45Z" level=info msg="    \"version\": \"3.2.13\"" cluster-name=test-etcd-lntbc cluster-namespace=e2e-etcd-operator-e2e-pr-3371 pkg=cluster
time="2019-05-09T20:39:45Z" level=info msg="}" cluster-name=test-etcd-lntbc cluster-namespace=e2e-etcd-operator-e2e-pr-3371 pkg=cluster
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x98 pc=0x10c9589]

goroutine 73 [running]:
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).startSeedMember(0xc4202345a0, 0xc42048fe00, 0x30)
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:318 +0x149
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).bootstrap(0xc4202345a0, 0x0, 0x3)
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:344 +0x2b
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).prepareSeedMember(0xc4202345a0, 0x0, 0x0)
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:167 +0x5a
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).create(0xc4202345a0, 0xc420472f60, 0x10d95a7)
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:161 +0x111
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).setup(0xc4202345a0, 0xc4202bfa70, 0xc420472f80)
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:148 +0x83
github.com/coreos/etcd-operator/pkg/cluster.New.func1(0xc4202345a0)
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:105 +0x35
created by github.com/coreos/etcd-operator/pkg/cluster.New
	/go/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:104 +0x33b

@mikkeloscar
Copy link
Contributor Author

@hasbro17 added the nil checks!

@mikkeloscar
Copy link
Contributor Author

Can you help me understand why jenkins-ci is failing?

I can build it locally:

./hack/build/build
building operator...
building backup-operator...
building restore-operator...

@hasbro17
Copy link
Contributor

@etcd-bot retest this please

@mikkeloscar
Copy link
Contributor Author

@hasbro17 thanks!

Copy link
Contributor

@hasbro17 hasbro17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Waiting for @hexfusion to take a final look.

@mikkeloscar
Copy link
Contributor Author

Friendly ping! :) If you need anything from me let me know!

@hasbro17
Copy link
Contributor

@mikkeloscar Just waiting on @hexfusion but I think we'll be able to merge this later today.
Apologies for the delay but can you please rebase this again?

@mikkeloscar
Copy link
Contributor Author

Apologies for the delay but can you please rebase this again?

Done!

Copy link
Member

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, thanks for the contribution.

@hexfusion
Copy link
Member

@mikkeloscar one last thing can we make this a single commit something like below.

*: Allow specifying Cluster Domain for clientURLs

After this is done we can merge right away.

# This is the 1st commit message:

Allow specifying Cluster Domain for clientURLs

This allows setting the ClusterDomain config in the PodPolicy. The
cluster domain is used as a suffix in the Client URLs for the etcd
members.

The ability to set a custom cluster domain is desirable when running in
clusters with a custom DNS configuration.

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>
@mikkeloscar
Copy link
Contributor Author

There you go!

@hasbro17 hasbro17 merged commit 8347d27 into coreos:master May 23, 2019
@mikkeloscar mikkeloscar deleted the custom-cluster-domain branch May 27, 2019 09:39
@aermakov-zalando
Copy link

@hexfusion
Is there an ETA on when this would make it into a release?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants