-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: K8s allow proxying through Gateway #9469
Conversation
✅ Deploy Preview for determined-ui canceled.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9469 +/- ##
==========================================
+ Coverage 49.31% 49.33% +0.01%
==========================================
Files 1241 1243 +2
Lines 161338 161827 +489
Branches 2868 2868
==========================================
+ Hits 79566 79831 +265
- Misses 81600 81824 +224
Partials 172 172
Flags with carried forward coverage won't be shown. Click here to find out more.
|
b9c204f
to
f43e47d
Compare
provisioner provided by Project Contour. | ||
|
||
On a local dev machine, you can use Minikube and Contour as the controller. We provide a script to | ||
simplify the process. This can be found in `tools/k8s/launch-minikube-with-gateway.sh`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to list out the steps for deploying the gateway and explaining each step so customers will have an easier time understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically about this script or in general for deploying a gateway?
There are some shared steps but the rest are controller dependent. We don't add a lot of custom steps on top, do we?
https://projectcontour.io/getting-started/ gives a good description for Contour would we do something differently
const defaultResourcePoolName = "default" | ||
const ( | ||
defaultResourcePoolName = "default" | ||
validGWPortRangeStart = 1025 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasonable?
} | ||
|
||
for i, proxyPort := range j.req.ProxyPorts { | ||
sharedName := fmt.Sprintf("%s-%d", j.jobName, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what'd be a reasonable way to de-indent this but it's going too far right makes it hard to read on gh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- a few clarifying comments, but code looks good!
}{ | ||
{"valid-label", false}, | ||
{"Valid-Label_123", false}, | ||
{"a", false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we add a case with special characters like "#$%"?
https://projectcontour.io/docs/1.29/guides/gateway-api/ | ||
|
||
*************** | ||
Envoy Gateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a minimum version we recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for k8s? or this particula gateway implementaiton?
for k8s we do mention 1.27 in the other page.
// I wanted to do this all in patches, but Gateways don't yet support strategic merge patch. | ||
// Its pretty easy to use json patch to add the port, but removing the port is a lot harder | ||
// as I don't think you can remove by value. Instead just serialize reading then submitting | ||
// updates on the gateway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a TODO for this? in the rm backlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this comment since it is less relevant that we switched the design to use the gateway to decide which ports are free
docs/release-notes/gateway.rst
Outdated
- Kubernetes: Introduced the Internal Task Gateway feature which allows Determined tasks running on | ||
remote K8s clusters to be exposed to the Determined master and proxies. This feature supports | ||
multi-resource manager setups by configuring a Gateway controller in the external K8s cluster. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:orphan:
New Features
- Kubernetes: The Internal Task Gateway feature enables Determined tasks running on remote Kubernetes clusters to be exposed to the Determined master and proxies. This feature facilitates multi-resource manager setups by configuring a Gateway controller in the external Kubernetes cluster.
docs/release-notes/gateway.rst
Outdated
Ensure that you have the necessary security measures in place to limit access to the exposed | ||
tasks and secure communication between the external cluster and the main one. Recommended | ||
measures include setting up a firewall, using a VPN, IP white-listing, K8s Network Policies, | ||
or other security measures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. important::
Enabling this feature exposes Determined tasks to the outside world. It is crucial to implement appropriate security measures to restrict access to exposed tasks and secure communication between the external cluster and the main cluster. Recommended measures include:
- Setting up a firewall
- Using a VPN
- Implementing IP whitelisting
- Configuring Kubernetes Network Policies
- Employing other security measures as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a similar restructured text format, visit this release note: https://docs.determined.ai/latest/release-notes.html#version-0-30-0
******** | ||
|
||
https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/#gs-gateway-api Based | ||
on Envoy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any https://
URL, use this format
Link text <https://url>
_
e.g.,
Cilium Gateway API Support <https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/#gs-gateway-api>
_
If you'd like to enable north-south access to Determined proxied tasks in external-to-master | ||
clusters you'd need to set up a gateway as described in the docs :doc:`Internal Task Gateway | ||
<internal-task-gateway>` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. note::
To enable north-south access to Determined proxied tasks in external-to-master
clusters, set up a gateway as described in the docs :doc:Internal Task Gateway <internal-task-gateway>
firewall, using a VPN, IP white-listing, K8s Network Policies, native cloud solutions, or other | ||
security measures. Some Determined tasks including JupyterLab Notebooks and shells already have | ||
secure transport built-in. Tensorboards do not use TLS currently so a tunneling solution should | ||
be considered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling this feature exposes Determined tasks to the outside world. It is crucial to implement appropriate security measures to restrict access to exposed tasks and secure communication between the external cluster and the main cluster. Recommended measures include:
- Setting up a firewall
- Using a VPN
- Implementing IP whitelisting
- Configuring Kubernetes Network Policies
- Employing other security measures as needed
Some tasks including JupyterLab notebooks and shells already have secure transport built-in.
Since TensorBoards do not use TLS, you should consider a tunneling solution.
bump up slots per trial to 2 on a new set of tests
also rename internal exposeProxyConfig var
spell out kubernetes apply review comments minus https replace with annotated titles rename titles to just "Reference" update titles. add a bit about the 3 categories
92f55e7
to
d8403ac
Compare
.circleci/real_config.yml
Outdated
@@ -3304,10 +3304,12 @@ jobs: | |||
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube | |||
- run: | |||
name: Start defaultrm minikube | |||
command: minikube start --profile defaultrm | |||
command: | | |||
K8S_VERSION=1.29.5 source tools/k8s/launch-minikube-with-gateway.sh defaultrm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can stamp this version at the top of file in case we need test other version later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed it up to the job level. I'm thinking we can push it higher up if/when we use it elsewhere?
Also I wasn't sure how those top-level parameters exactly work
.circleci/real_config.yml
Outdated
- run: | ||
name: Start additionalrm minikube | ||
command: minikube start --profile additionalrm | ||
command: minikube start --profile additionalrm --kubernetes-version 1.29.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here so it's easy to update in one place
Ticket
https://hpe-aiatscale.atlassian.net/browse/RM-137
Description
Add support for reaching K8s tasks in additional k8s clusters.
Test Plan
Checklist
docs/release-notes/
.See Release Note for details.