fix: Genai helm service fix #8885

tayritenour · 2024-02-24T00:18:35Z

Description

We need to make sure that genai can be populated with the service names from k8s to generalize depending on how the network is set up. This allows us to send a particular host service name to the backend server directly.

We also would like to make sure to not have to make assumptions about the kubernetes cluster having load balancers set up. The genai services have been changed over to use ClusterIP

Test Plan

Validate that when checking the helm deploy that it does not have a LoadBalancer for anything other than the determined master

Add the following to your values.yaml:

## Configure GenAI Deployment
genai:
  ## Version of GenAI to use. If unset, GenAI will not be deployed
  version: "0.1.1"
  
  ## Port for GenAI to backend use
  port: 9011
  
  ## Port for GenAI message queue
  messageQueuePort: 9013
  
  ## Secret to pull the GenAI image
  # imagePullSecretName:

  ## GenAI pod memory request
  memRequest: 1Gi

  ## GenAI pod cpu request
  cpuRequest: 100m

  ## GenAI pod memory limit
  # memLimit: 1Gi

  ## GenAI pod cpu limit
  # cpuLimit: 2

  ## PVC Name for the shared file system for GenAI.
  ## Note: Either `sharedPVCName` or `generatedPVC.storageSize` (to
  ## generate a new PVC) is required for GenAI deployment
  # sharedPVCName:

  ## Spec for the generated PVC for GenAI
  ## Note: In order to generate a shared PVC, you will need access to a
  ## StorageClass that can provide a ReadWriteMany volume
  generatedPVC:
    ## Storage class name for the generated PVC
    # storageClassName: standard-rwx
    storageClassName: standard

    ## Size of the generated PVC
    storageSize: 10Gi

  ## Extra Resource Pool Metadata is hardcoded information about the
  ## GPUs available to the resource pools. This information
  ## is not provided in k8s so we provide it directly.
  ## Note: All resource pools defined here need to also be reflected in
  ## the .Values.resourcePools.
  extraResourcePoolMetadata:
    a100:
      gpu_type: A100
      max_agents: 3
    # v100:
    #   gpu_type: V100
    #   max_agents: 2

Deploy: helm install --generate-name . --set maxSlotsPerPod=1 --debug --dry-run
It should not contain LoadBalancer in the other services for genai

Commentary (optional)

Checklist

Changes have been manually QA'd
User-facing API changes need the "User-facing API Change" label.
Release notes should be added as a separate file under docs/release-notes/.
See Release Note for details.
Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

netlify · 2024-02-24T00:18:53Z

✅ Deploy Preview for determined-ui canceled.

Name	Link
🔨 Latest commit	`49f2862`
🔍 Latest deploy log	https://app.netlify.com/sites/determined-ui/deploys/65dd33fd41b1a80008684e61

codecov · 2024-02-26T17:58:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.52%. Comparing base (a8ac657) to head (49f2862).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8885      +/-   ##
==========================================
- Coverage   47.53%   47.52%   -0.01%     
==========================================
  Files        1066     1066              
  Lines      170248   170248              
  Branches     2235     2235              
==========================================
- Hits        80919    80915       -4     
- Misses      89171    89175       +4     
  Partials      158      158

Flag	Coverage Δ
backend	`43.34% <ø> (-0.01%)`	⬇️
harness	`63.77% <ø> (-0.01%)`	⬇️
web	`42.50% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 4 files with indirect coverage changes

…osts

ioga · 2024-02-27T01:43:06Z

helm/charts/determined/templates/genai/genai-queue-service.yaml

+    release: {{ .Release.Name }}
+spec:
+  ports:
+  - port: {{ required "A valid Values.genai.messageQueuePort entry required!" .Values.genai.messageQueuePort }}


that's a new field, right? maybe values.yaml needs an update with a commented out example.

It is, we actually removed the examples from the values.yaml when I was talking with @NicholasBlaskey a bit ago: https://github.com/determined-ai/determined/pull/8821/files

The idea being that the values would be confusing to new users and that genai isn't fully released yet.

Instead we have documentation here: https://hpe-ai-solutions-documentation.netlify.app/products/gen-ai/latest/admin/set-up/install-kubernetes/ which I will update to include that parameter

* wip trying new service for redis queue * update the genai helm chart integration to enforce services for all hosts * revert the changes from testing * use clusterip instead

cla-bot bot added the cla-signed label Feb 24, 2024

tayritenour added 4 commits February 26, 2024 16:59

wip trying new service for redis queue

769ede7

update the genai helm chart integration to enforce services for all h…

8414829

…osts

revert the changes from testing

d66382e

use clusterip instead

49f2862

tayritenour force-pushed the genai-helm-service-fix branch from 815efc8 to 49f2862 Compare February 27, 2024 00:59

tayritenour requested a review from ioga February 27, 2024 01:39

ioga approved these changes Feb 27, 2024

View reviewed changes

tayritenour merged commit ca96da1 into main Feb 27, 2024
74 of 88 checks passed

tayritenour deleted the genai-helm-service-fix branch February 27, 2024 17:50

maxrussell pushed a commit that referenced this pull request Mar 21, 2024

fix: Genai helm service fix (#8885)

72df4b0

* wip trying new service for redis queue * update the genai helm chart integration to enforce services for all hosts * revert the changes from testing * use clusterip instead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Genai helm service fix #8885

fix: Genai helm service fix #8885

tayritenour commented Feb 24, 2024 •

edited

Loading

netlify bot commented Feb 24, 2024 •

edited

Loading

codecov bot commented Feb 26, 2024 •

edited

Loading

ioga Feb 27, 2024

tayritenour Feb 27, 2024

fix: Genai helm service fix #8885

fix: Genai helm service fix #8885

Conversation

tayritenour commented Feb 24, 2024 • edited Loading

Description

Test Plan

Commentary (optional)

Checklist

Ticket

netlify bot commented Feb 24, 2024 • edited Loading

✅ Deploy Preview for determined-ui canceled.

codecov bot commented Feb 26, 2024 • edited Loading

Codecov Report

ioga Feb 27, 2024

Choose a reason for hiding this comment

tayritenour Feb 27, 2024

Choose a reason for hiding this comment

tayritenour commented Feb 24, 2024 •

edited

Loading

netlify bot commented Feb 24, 2024 •

edited

Loading

codecov bot commented Feb 26, 2024 •

edited

Loading