Skip to content

v1.3

Compare
Choose a tag to compare
@rngy rngy released this 12 Jul 17:18
2bfbcd7

Quick start solutions

  1. Add finetuning gemma on GKE with L4 GPUs example (#697)
  2. Jetstream autoscaling guide (#703)
  3. Enable Ray autoscaler for RAG example application (#722)
  4. Fix GKE training tutorial (#706)
  5. Update Kueue to 0.7.0 (#707)

ML Platform release (#715)

  • Documentation
    1. Add notebook packaging guide to docs (#690)
    2. Added enhancements to the data processing use cases
  • Infrastructure
    1. Added H100 and A100 40GB DWS node pools
    2. Moved cpu node pool from n2 to n4 machines
    3. Updated Kueue to 0.7.0
    4. Added initial test harness
  • Configuration Management
    1. ConfigSync git repository name allows for easier use of multiple environments. Standardized GitOps scripts
    2. Added a GitLab project module and allowed users to choose between GitHub and GitLab
  • Observability
    1. Added NVIDIA DCGM
    2. Added environment_name to the Ray dashboard endpoint
    3. Added Config Controller Terraform module
  • Security
    1. Added allow KubeRay Operator to the namespace network policy
    2. Added Secret Manager add-on to the cluster

TPU Provisioner

  1. Add admission label selectors and e2e test script (#702)

Benchmarking

  1. Add temperature and power usage DCGM metrics (#710)
  2. Support for csv output (#705)

General

Add custom metrics stackdriver adapter Terraform module (#718)
Add prometheus adapter Terraform module (#716)
Add Jetstream MaxText Terraform module (#719)