Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Releases: microsoft/pai

v1.8.0: July 2021 Release

16 Jul 02:45
d60cba4
Compare
Choose a tag to compare

Release v1.8.0

New Features

  • Marketplace related update

  • Alert manager

    • Send alert to users when job status changed #5337
  • Webportal

    • Support UX of Job Priority #5417
  • Others

    • Customizable Autoscaler #5412
    • Add custom ssl port support #5386
    • Clean up repo. Remove obsolete code #5489

v1.7.0: April 2021 Release

28 Apr 04:13
ccafd8e
Compare
Choose a tag to compare

Release v1.7.0

New Features

  • Marketplace related update

  • New job submission page

    • Please refer to new submission tutorial for how to use new submission page.
    • New submission page replaces Advanced with More info and places it under each section to improve user experience.
    • In new submission page, the sidebar can be shrank to give the main area more visual space.
    • The new submission page moves the yaml editor into a single page, which allows user to focus on setting config or editing yaml protocol.
    • The new submission page improves the responsive design in small and medium resolution.

    Know Issue: Tensorboard tool is not implemented in the new submission page yet. If you need to use it, please use the old version.

  • Alert system enhancement

    • Add alert & auto-fix for GPU perf issue #5342 #5383
    • Refine kill-low-efficiency-job-alert email templates #5384
    • Add alert for API server cert expired #5334
  • Support sort by completionTime for get job list API #5347

  • Deployment

Bug fixes:

  • Webportal package build issue #5378

v1.6.0: Mar 2021 Release

18 Mar 08:47
b18ec56
Compare
Choose a tag to compare

Release v1.6.0

Upgrade Guide

Before upgrade, we recommend you to check this issue first.

New Features

  • Job protocol update: Add prerequisites #5145

  • Marketplace related update

  • Introduce an optional docker cache in cluster #5290

  • A regular GPU utilization report can be set up for admins #5281, #5294, #5324, #5331

    • #5324 introduces a schema change for pai-bearer-token in the alert-manager section. The old configuration still works but is deprecated. If you have configured pai-bearer-token of alert-manager, please refer to #5331 to modify the previous configuration.
  • Users can save frequently-used SSH publish keys on the profile page #5223

  • Improve log experience #5271 #5272

  • Reduce ansible logs when deploy #5305

Bug Fixes:

  • Database controller: Tolerant to wrong framework spec #5284
  • Database controller: Remove sensitive fields in db #5289
  • Database controller: Fix memory leak #5309
  • Set correct launchTime in rest-server #5307
  • Database may use unmounted host path #5343

v1.5.0: Jan 2021 Release

27 Jan 06:49
f67d4ef
Compare
Choose a tag to compare

Release v1.5.0

New Features

  • Improve Web Portal Experience

    • Fix Home page overlap issue #5213 #5180
    • Add filter, search box and export csv button in task detail list #5175
  • Create a new page for yaml editor #5172

  • Marketplace related update

  • Support different types of computing hardware #5138

  • Deployment process refinement

    • master.csv + worker.csv -> layout.yaml
    • move config.yaml, layout.yaml under quick-start folder, remove all the argument parse logic
    • Add support for cpu-only worker installation
    • Add support for heterogeneous workers
    • Unify version requirements: pai version, pai image tag
    • Set default value in config files
    • Generate hiveD config with layout.yaml #5179
    • Check layout before installing k8s #5184 #5181
    • Config folder structure arrangement
    • Refine installation logs
    • Add skip service list argument #5193
  • Log manager

    • Change get logs api return code #5125

v1.4.1: Dec 2020 Release

24 Dec 08:19
f25e926
Compare
Choose a tag to compare

Release v1.4.1

Bug Fixes

  • Marketplace
    • Fix initializing blob data issue (#5189)
  • Log Collection
    • Fix getting wrong log for retried task & frontend crash issue (#5190)

v1.4.0: Dec 2020 Release

09 Dec 04:58
0c1a96b
Compare
Choose a tag to compare

Release v1.4.0

New Features

  • multi-cluster (#4929)
  • Autoscaler
    • Update docs for Cluster Autoscaler on AKS Engine (#5057)
  • Log Collection (#4992)
    • Rest API
    • Webportal
  • Https configuration document (#5076, #5078)
  • Marketplace (microsoft/openpaimarketplace#73)
    • Data
      • Move NFS to Azure Blob as backend
      • Upload Job output to Azure Blob
      • Download data from azure blob to local
      • Use Azure storage SDK for privacy
      • Refactor data use logic after change storage to blob
      • Update project development doc and manual
    • Service Deployment
      • Start Local Rest Server
      • Deployed Rest Server in PAI
      • Start database and save items into it
      • Register in PAI pylon (#5066)
      • Add azure storage to service configuration (#5104)
  • Web Portal
    • Fix stop job button issue #5079
  • Admin Experience
    • Prometheus alert rules update (#5021)
    • Refine deployment process (#5077, #5085)

Bug Fixes

  • Fix updateUserGroupList API issue (#5121)
  • Fix hived config issue caused by k8s coreDNS deployment (#5071)

v1.3.0: Nov 2020 Release

10 Nov 05:48
6e80802
Compare
Choose a tag to compare

Release v1.3.0

New Features

  • Marketplace
  • HiveD Scheduler
    • Support cluster autoscale with HiveD scheduler on AKS (#4868)
    • Support dynamic sku types for different vc on webportal (#4900)
  • Advanced job debug mode
  • GPU monitoring and utilization
    • Support job tagging (#4924)
    • Stop low GPU utilization job with alert-manager (#4940)
    • Cordon node with GPU ECC Errors (#4942)
  • Documentation
    • Fix document according to DRI tickets (#4828)
    • Add distributed examples (#4821)
  • Webportal
    • Add help info for items on webportal (#4950)

Known Issues

  • Job stop button no feedback after click successfully (#5023)
  • Alert handler stop-job notice not clear to end user (#5021)
  • DB framework / Rest-Server job inconsistency (#5027)

v1.2.1: Oct 2020 Release

15 Oct 12:50
0130846
Compare
Choose a tag to compare

Release v1.2.1

Bug Fixes:

  • Fix config generate bug #4970
  • Fix database controller dependency #4978

v1.2.0: Sep 2020 Release

24 Sep 17:33
1167180
Compare
Choose a tag to compare

Release v1.2.0

New Features

Improvements

  • HiveD improvement (#4868)
  • Robustness improvement (#4694)

Bug Fixes

Known Issues

v1.1.1: July 2020 Release

27 Jul 04:27
022d158
Compare
Choose a tag to compare

Release v1.1.1

Bug Fixes:

  • Fix SDK request timeout #4756