Skip to content

Releases: determined-ai/determined

0.34.0

28 Jun 20:09
Compare
Choose a tag to compare

Release Notes

0.34.0

Changelog

  • ede2396 chore: bump version: 0.34.0-rc12 -> 0.34.0
  • f0d825d chore: bump version: 0.34.0-rc11 -> 0.34.0-rc12
  • 1556c18 fix: Pause/Resume run test flake (#9592)
  • a74e389 docs: add release notes for 0.34.0 (#9561)
  • e5fc5f1 chore: bump version: 0.34.0-rc10 -> 0.34.0-rc11
  • a51a640 fix: edit/move modals for projects in workspaces unexpectedly closes [DET-10388] (#9588)
  • ce3ea17 chore: bump version: 0.34.0-rc9 -> 0.34.0-rc10
  • 5b40a5c chore: remove shared cluster test for circle ci (#9579)
  • 0b4dec4 chore: bump version: 0.34.0-rc8 -> 0.34.0-rc9
  • 01baf33 chore: Release 0.34.0 bumpenvs (#9578)
  • bad22b2 chore: add Nvidia drivers version matching test and bump env [MD-413] (#9567)
  • 9adbe7c Revert "chore: 0.34.0 bumpenvs (#9565)"
  • 60ada0c chore: bump version: 0.34.0-rc7 -> 0.34.0-rc8
  • cde8a18 fix: wrong notebook idleness payload [MD-447] (#9571)
  • 3f292f5 chore: bump version: 0.34.0-rc6 -> 0.34.0-rc7
  • f36b110 fix: correct workspace_id column type on allocation_workspace_info (#9574)
  • f0f45f8 chore: bump version: 0.34.0-rc5 -> 0.34.0-rc6
  • a6c7918 fix: persist workspace id/name & experiment id for historic allocations [DET-10378] (#9550)
  • f66e816 chore: bump version: 0.34.0-rc4 -> 0.34.0-rc5
  • eaabab1 fix: add validation to patching project key (ET-305)
  • d8b80ad fix: do not modify cached GetAgentsResponse (#9569)
  • 25804d7 chore: bump version: 0.34.0-rc3 -> 0.34.0-rc4
  • ead2232 fix: return workspace name for breadcrumb in Project Details page (#9564)
  • 8da67d2 chore: 0.34.0 bumpenvs (#9565)
  • 2677dc2 chore: bump version: 0.34.0-rc2 -> 0.34.0-rc3
  • 42bea1a chore: fix boto3 requirement syntax (#9551)
  • b2c7e22 chore: bump version: 0.34.0-rc1 -> 0.34.0-rc2
  • 2f1283d fix: hide warning for weak password unless it actually applies [DET-10216] (#9538)
  • ca208b9 chore: bump version: 0.34.0-rc0 -> 0.34.0-rc1
  • f15bda8 feat: det deploy local generates a password for you [DET-10197] (#9518)
  • abaf2e3 chore: bump version: 0.34.0-dev0 -> 0.34.0-rc0
  • 0cf7aba chore: lock published urls to preserve redirects
  • cd85b44 chore: lock api state for backward compatibility check
  • 25b6299 chore: bump version: 0.33.1-dev0 -> 0.34.0-dev0
  • 83b9a8b feat: add connect modal for notebook and shell tasks [MD-404] (#9545)
  • b9ea173 chore: Bumpenvs 8c90e80 (#9544)
  • f9a5dd5 fix: update getProjectColumns calls (ET-270) (#9509)
  • 325d47e pre-commit lint check fix (#9543)
  • 553521e feat: enable token auth for Jupyter notebooks [MD-404] (#9452)
  • ea929fc test: det framework supports "nth" component [testeng-1] (#9540)
  • 7568129 docs: address two link check failures (#9539)
  • 3641bfc feat: support proxied Determined tasks on remote k8s clusters (#9469)
  • 44f446c fix: Huggingface Trust Remote Repo (#9535)
  • 3320107 chore: allow empty run metadata requests to delete existing metadata (#9524)
  • 8006e2e fix: localize debounced settings updates (#9513)
  • fec31a1 chore: handle empty nested structs in run metadata as nil leaf nodes (#9526)
  • 88b01c6 refactor: remove DataGrid pagination code (ET-259) (#9520)
  • edbeee9 test: increase timeouts for running experiments on k8s after env split (#9530)
  • 0f6eb24 fix: webui page height (#9527)
  • 1630c45 docs: Clarify startuphook (#9517)
  • 63a4163 feat: support node selectors & affinities for Kubernetes resource pools (#9428)
  • 6cd7d06 chore: Improving SearchRuns performance when doing hyperparameter filtering (#9489)
  • 4321143 ci: add new feature signoff checkbox [INFENG-710][skip ci] (#9410)
  • 10667f1 feat: remove round robin scheduler for agentrm (#9493)
  • 735fb2c chore: remove hyperparameters from projects table (#9504)
  • 66ec006 feat: warn users to change their passwords [DET-10216] (#9519)
  • 2bce8b6 fix: historical allocations not appearing (#9522)
  • f9ba7f4 fix: skip webhook regex matching for exp config (#9511)
  • b51bc93 docs: Fix broken links (#9523)
  • 9adc092 fix: partially scheduled k8s jobs should display as queued (#9468)
  • 32585ad feat: flat runs comparison view [ET-190] (#9477)
  • d44013c feat: add arbitrary metadata GET/POST endpoints (#9130)
  • 21ecda5 test: preparing a homework assignment [TESTENG-3] (#9510)
  • ee66d15 fix: allow doesnotcontains filters on hyperparameter column (#8842)
  • 382995c fix: historical allocations not showing task allocation workspace (#9496)
  • 8e9067b feat: Framework Splitting and Bumpenvs (#9457)
  • f0d26db ci: fix some failing long-running tests related to password requirements (#9421)
  • d0d30cf test: collect det task logs as artifacts for ci jobs (#9459)
  • e3d01c1 chore: remove debug logs that were accidentally committed (#9503)
  • 3857b7a test: upload unit and intg tests to datadog[infeng-501] (#9505)
  • 3afe5df chore: check non-multiples of slots per pod for kubernetes rm [MD-403] (#9393)
  • 2ca7733 fix: ensure number of project keys possible for testing is not exceeded (#9501)
  • 0f30189 docs: Update the URL to the genai docs (#9507)
  • 86e6b68 feat: add cluster-wide message (#9261)
  • e138267 fix: Searches view fixes (ET-297) (#9487)
  • aa6521b fix: run columns mismatching sort/filter columns for run table (#9479)
  • de03909 fix: use num pods in k8 job summary (#9497)
  • 439734b chore: avoid payload limitation (#9164)
  • dde6362 fix: Use experiment config to determine is_multi_trial in api_runs queries (#9475)
  • f87214b test: preparing a homework assignment [TESTENG-4] (#9495)
  • 27e7307 feat: add custom key to projects table, backfilling based on current project name, and API support (#9134)
  • a5cf959 fix: pin huggingface version to <0.23.0 (#9483)
  • 97667c5 tc: Test format (#9490)
  • ffee34f test: update playwright [TESTENG-4] (#9484)
  • 869b96a test: readme and test name revisions [TESTENG-5] (#9463)
  • 698ab6c test: docstring revisions [TESTENG-5] (#9478)
  • 96c061b ci: lower hf trainer accuracy target + improve failure messages (#9322)
  • 84299a6 chore: upgrade golangci linter to 1.57.2 (#9279)
  • 2588eea feat: Pause & Resume run (#9129)
  • df3919c docs: remove all references to PowerPC/PPC64 (#9476)
  • ca03da1 chore: switch to mockery config file (#9473)
  • 9160ae9 docs: correct release note for deprecating PPC64/POWER builds (#9470)
  • 418525e fix: convert invalid hparam types to json string (#9449)
  • 934aeb6 fix: job state shows as scheduled when resources are allocated (#9466)
  • 8d64508 feat: remove genai from experimental feature list and enable via /master feature switches [GAS-1016] (#9435)
  • 4d8596c chore: deprecate PPC64/POWER builds (#9467)
  • 13a5142 fix: Revert to get_checkpoints.sql call to enable NaN & Infinity values in searcher metric (#9440)
  • d50433d chore: no longer store ee artifacts in circleCI (#9426)
  • a45aa1e feat: add SearchDetails page (ET-53) (#9436)
  • 4c821c3 docs: clarify data collected by telemetry (#9445)
  • 57bece4 fix: job queue's allocated slots should be correct after restarts (#9461)
  • c49eeea test: datagrid tests [INFENG-687] (#9400)
  • 8a9839a feat: add option for Checkpoint_GC pod spec in task container defaults (#9406)
  • d960f29 chore: only connect to the database once (#9456)
  • 0fdb822 feat(rm): convert Kubernetes submissions from pods to jobs (#9438)
  • f54fb7c test: react test datadog integration [infeng-497] (#9455)
  • cc4ad2b docs: fix observability README docs link (#9453)
  • ca60325 chore: bump version: 0.33.0-dev0 -> 0.33.1-dev0
  • 4936847 chore: add docs dropdown link for new version
  • 7b81df7 docs: add release notes for 0.33.0 (#9444)
  • da2f943 feat: add heatmap to runs table [ET-230] (#9429)
  • 0599d0e test: create test users through the API [INFENG-673] (#9431)
  • ac459f7 docs: Add historical cluster usage warning (#9439)
  • cf22597 docs: update broken nvidia anchor link (#9441)
  • d94e299 fix: notify master for core checkpoint deletes [MD-325] (#9415)
  • b96ccba fix: dont utilize the default efs mount on normal aws deploys (#9437)
  • a0f2e33 fix: redirect on sso login (#9369)
  • 9abde37 chore: remove stdlib errors package from lint blocklist (#9381)
  • 515c135 fix: add Admin Settings to NavigationTabbar (ET-194) (#9423)
  • 00bbda6 fix: set the defaults for shared_fs mount in genai correctly (#9433)
  • 9d54093 chore: skip TestSchedule until flake is fixed (#9434)
  • 9524dd4 ci: use priority scheduler in e2e tests (#9430)
  • 58b31e6 docs: terraforming an EKS cluster with autoscaling and EFS. (#9427)
  • 8a6f571 docs: ignore anchor for observability links (#9412)
  • 684c38b fix: add feature gate for checking for blank admin/determined password [DET-10197] (#9425)
  • 6ad9d73 fix: Keep template modal open when config is invalid (#9424)
  • 3dfb9ec test: remove confusing, unused slurm-related ci code (#9417)
  • cdd7a82 test: ensure make unslurmcluster always runs in CI (#9420)
  • ba31f03 fix: reset InteractiveTable pagination when filters applied [ET-183] [ET-121] (#9413)
  • 3cbe805 fix: master checks db newness before migrating [DET-10312] (#9414)
  • da46208 fix: bulk action bug in the old experiment table that cannot trigger bulk actions across pages (#9404)
  • 657286c feat: Add Run columns to GetProjectColumns (#9146)

0.33.0

29 May 20:46
Compare
Choose a tag to compare

Release Notes

0.33.0

Changelog

Read more

0.32.1

10 May 17:24
Compare
Choose a tag to compare

Release Notes

0.32.1

Changelog

  • 7d0b38a chore: bump version: 0.32.1-rc0 -> 0.32.1
  • 351826c docs: add release notes for 0.32.1 (#9351)
  • 947585f chore: bump version: 0.32.1-dev0 -> 0.32.1-rc0
  • f9da12f chore: lock api state for backward compatibility check
  • 1e8f8de fix: remove pod labels with potentially incompatible names (#9349)
  • 6995ca6 chore: bump version: 0.32.0 -> 0.32.1-dev0

0.32.0

08 May 15:35
Compare
Choose a tag to compare

Release Notes

0.32.0

Changelog

  • a1b7242 chore: bump version: 0.32.0-rc8 -> 0.32.0
  • d8580c2 docs: add release notes for 0.32.0 (#9301)
  • 2244f71 chore: bump version: 0.32.0-rc7 -> 0.32.0-rc8
  • 0322dc7 fix: filter action experiments, old ExperimentList (#9325)
  • 5ebb008 chore: bump version: 0.32.0-rc6 -> 0.32.0-rc7
  • b208794 fix: filter batch action experiments (#9316)
  • 991818b chore: bump version: 0.32.0-rc5 -> 0.32.0-rc6
  • e277782 fix: Bulk Action bug (#9255)
  • b2663af chore: bump version: 0.32.0-rc4 -> 0.32.0-rc5
  • ee63b67 fix: users can be removed from all groups in Web UI (#9259)
  • 00b95c3 chore: bump version: 0.32.0-rc3 -> 0.32.0-rc4
  • 642e323 fix: historical-usage date calculation bug (#9257)
  • f506989 chore: bump version: 0.32.0-rc2 -> 0.32.0-rc3
  • 1047e78 fix: hew update for select bug in log viewer (#9249)
  • 4c59c9c chore: bump version: 0.32.0-rc1 -> 0.32.0-rc2
  • f8ad009 fix: undo default log retention in values.yaml (#9245)
  • 4b3adb9 docs: add a release note for aurora issue. (#9241)
  • 004fe70 fix: allow genai deployments with agent GIDs set to share data properly (#9243)
  • be231d9 chore: bump version: 0.32.0-rc0 -> 0.32.0-rc1
  • 714264e chore: bump version: 0.32.0-dev0 -> 0.32.0-rc0
  • dc88b9f chore: bump version: 0.31.1-dev0 -> 0.32.0-dev0
  • 7ffdadf ci: add determined-ee context to python ee publish (#9234)
  • c18ac83 fix: properly merge resource configs (#9233)
  • 3b39d3c chore: add log retention to help charts (#9216)
  • 3646395 chore: lock published urls to preserve redirects
  • 80d8909 chore: lock api state for backward compatibility check
  • 39b948c feat: add genai user role to rbac (#9206)
  • 43289e9 test: ee and oss have separate handling (#9218)
  • 1ca3613 fix: debounce userSettings update (#9220)
  • ab382b4 chore: update the license date (#9225)
  • ff10ac0 docs: Fix broken links (#9219)
  • ac68df8 chore: default observability.enable_prometheus to true (#9222)
  • 26c1940 chore: upgrade protoc used in CI (#7935)
  • 9f6bbc9 chore: Add streaming updates feature flag [MD-371] (#9190)
  • f8b3736 ci: Exclude deploy/README.md from build (#9211)
  • 3bfc212 fix: hew update for chart scroll bug (#9210)
  • da8a040 feat: CLI allows and requires creating a user with a password DET-10184 (#9112)
  • fbccaf1 chore: clean up rm module [RM-202] (#9191)
  • 8caf3cb test: user tests [INFENG-455] (#9152)
  • 3568f27 fix: Skip expected error from web socket (#9194)
  • 1b212ae feat: add kill run endpoint (#9061)
  • e7d870e test: use devcluster for react tests [INFENG-449] (#9185)
  • bd4a54e fix: shared cluster test to work in OSS again (#9195)
  • b874acb docs: fix another instance of broken docs link (#9208)
  • 86be18a ci: pass ee into args to prevent latest main deploying as ee (#9207)
  • f74ab9c docs: Describe multi rm k8s (#9025)
  • 6fb1c52 ci: deploy awscli to system (#9188)
  • 9cfbb59 docs: fix nvidia device plugin link for EKS (#9204)
  • 3e865c6 test: skip flakey user provision tests (#9203)
  • 598784d chore: make multi-RM an EE-only feature [RM-166] (#9192)
  • 6d2be52 ci: fix test-det-deploy-local (#9196)
  • 5f312ed test: can't launch NSC test assert 404 instead of 403 (#9197)
  • 4b1c937 test: fix a test util issue with master config schema assumptions (#9193)
  • 0bc13d8 feat: non-blocking metrics reports [MD-144] (#9107)
  • 2ced9b9 ci: do dry runs of publish-docs for RCs (#9186)
  • 72344e0 feat: Use feature flag for streaming updates - manually update project store (#9170)
  • dd7f4b5 docs: add profiling section for trainer API UG [MD-373] (#9177)
  • 06586f0 fix: better exception handling in detached mode (#9183)
  • 283daab feat: Unfork Enterprise Edition (EE) and require license key for EE features (#9168)
  • f233c95 docs: FAQ for python SDK ckpt download, k8s deprecation labels. (#9187)
  • 6fcefac chore: bump version: 0.31.0-dev0 -> 0.31.1-dev0
  • 19688a9 chore: add docs dropdown link for new version
  • 2b2e96a docs: add release notes for 0.31.0 (#9159)
  • b194686 chore: style fix for helm initialUserPassword (#9158)
  • a5e9f0c chore: add option to auto pick the only matching name on partial hits (#9108)
  • 371c90b fix: louden server errors coming from deleteCheckpoints (#9184)
  • 0765e38 chore: pass correct master scheme to genai (#9181)
  • 26f5e0b fix: report errors from deletecheckpoints endpoint + improve feedback (#9178)
  • 1037d83 chore: bumpenv update NGC base images version to 24.03 (#9132)
  • 1cc9cd7 fix: count determined-system pods as det pods [RM-148] (#9148)
  • 0fc247c fix: single-searcher MNIST example runs for multiple epochs (#9160)
  • d41c4a7 fix: fix docs and wording (#9179)
  • 5541e54 feat: RM-130 add determined info as pod labels (#9140)
  • ee15da0 test: Djanicek/infeng 456/workspaces and projects (#9117)
  • e6c0c99 chore: add typing annotations for zmq (#9176)
  • 4ceaed0 docs: Add readme to toc (#9175)
  • 3105407 chore: make the data_dir consistent to other advertised devc configs (#9157)
  • d38fc3c fix: Reset table offset when filtering for models (#9167)
  • 338d5d3 docs: remove max supported k8s version. (#9171)
  • 35d249f chore: add flake8 relative-import rule (#9169)
  • ffed598 feat: support for mounting a hostPath for the shared file system in genai (#9161)
  • 2f874b9 test: experiment list page models and sample test [INFENG-451] (#9139)
  • fd45ed8 ci: merge EE and OSS doc deploy together [INFENG-625] (#9162)
  • 0b2eab0 docs: Copy debug to exp config (#9120)
  • 3f70a46 chore: style fix for helm tls (#9163)
  • 8a94574 chore: new image publishing (#9090)
  • 8b83122 fix: TensorBoard visualization from batch actions. (#9156)
  • 384e5c0 fix: fix disable button condition in launch jupyter notebook modal (#9155)
  • b109108 feat: add helm master level config for tcd startup hooks (#9135)
  • 0228a95 ci: publish-docs installs awscli into user space (#9153)
  • 746ba26 chore: add alert metric for Prometheus and add Grafana alert docs [RM-118] (#9150)
  • 291565b fix: keras and tensorflow import errors in new versions (#9141)
  • 831df43 feat: create flat runs view [ET-24] (#9023)
  • 5854b8b chore: add a devcluster config to run Determined across multiple Kubernetes clusters locally (#9151)
  • d0497da fix: fix docs for log retention (#9149)
  • bd29f1f fix: cli gives misleading error message when logging in with a bad password [MD-277] (#8990)
  • 95f87d7 fix: ensure all columns have widths (#9136)
  • 3f7a396 test: fix test_logging typehint syntax error (#9142)
  • 93e7bdf test: ignore e2e test cases in vitest (#9128)
  • 4d1b8ae docs: revert helm values change for multirm (#9145)
  • c3d13b6 docs: revert-multiRM-mc-doc (#9144)

0.31.0

17 Apr 22:35
583e0c3
Compare
Choose a tag to compare

Release Notes

0.31.0

Changelog

0.30.0

04 Apr 18:24
Compare
Choose a tag to compare

Release Notes

0.30.0

Changelog

  • 5a63518 chore: bump version: 0.30.0-rc5 -> 0.30.0
  • 97aaa02 docs: add release notes for 0.30.0 (#9103)
  • c108443 chore: bump version: 0.30.0-rc4 -> 0.30.0-rc5
  • 4ce78b2 fix: prevent checkpoint modals from closing on their own [ET-116] [ET-120] (#9094)
  • 8bcdcc8 chore: bump version: 0.30.0-rc3 -> 0.30.0-rc4
  • e90238a chore: bump version: 0.30.0-rc2 -> 0.30.0-rc3
  • b8db2e6 fix: slot stats are not filled in everywhere (#9070)
  • d2e3a5c fix: remove parent_id from create_experiment (#9068)
  • 61958ef fix: API migration to improve performance in resource pool page (#9056)
  • 62d102b chore: bump version: 0.30.0-rc1 -> 0.30.0-rc2
  • bc241b6 docs: Update release notes (#9044)
  • 2e31ece fix: loading experiments without filterset (#9059)
  • 4efaede chore: bump version: 0.30.0-rc0 -> 0.30.0-rc1
  • d2949d3 faster migrations (#9060)
  • 4c6e35c feat: add slot stats to /agents endpoints (#9048)
  • f32dc82 chore: bump version: 0.30.0-dev0 -> 0.30.0-rc0
  • 10030a6 chore: lock published urls to preserve redirects
  • 220f820 chore: bump version: 0.29.2-dev0 -> 0.30.0-dev0
  • 1e6f0f7 feat: Use filtered resource pools when creating notebook (#9045)
  • 74fe16b feat: profiling v2 [MD-27] (#9032)
  • 133d127 docs: revert multirm docs changes #9016
  • 1992c97 chore: optional DB migrations (#9047)
  • 84ba688 fix: docs lint (#9052)
  • 848b216 feat: add command det model delete (#9039)
  • 1202d5c refactor: DET-9976 remove agentID type from agentrm (#9040)
  • 0710c58 docs: Describe editorrestricted (#9049)
  • 02da36f chore: mark db-dependent tests as needing to run in integration (#9041)
  • 6c88e8d fix: move experiment SQL error (#9042)
  • 3fa0df1 Revert "docs: add EditorRestricted role release note (#9007)" (#9046)
  • 60cb003 test: Jcom/infeng 454/sign in tests (#9013)
  • f08b406 ci: tag CI-deployed resources (#9043)
  • 1868723 build(deps): bump google.golang.org/protobuf from 1.28.0 to 1.33.0 (#8996)
  • d4ab20b build(deps): bump github.com/docker/docker (#9026)
  • e4bc377 test: playwright config and browser usability (#9024)
  • f6b9ac8 build(deps): bump github.com/jackc/pgx/v4 from 4.12.0 to 4.18.2 (#8987)
  • c811947 chore: helm for multirm kubeconfig_path (#9033)
  • 4441d6d feat: Add template to py sdk create_experiment (#8927)
  • 5ac1b85 chore: revert helm for multirm kubeconfig_path (#9030)
  • 6fec24d chore: helm for multirm kubeconfig_path (#9015)
  • 0518785 feat: streaming update code generation for typescript (#8988)
  • 39afa3c docs: add documentation for multirm (#9016)
  • 7e37c22 chore: add grpc based auth fallback to proxied requests (#8980)
  • 5e1f2af fix: Experiment.await_first_trial exits when Experiment is terminal (#9022)
  • a603f4c chore: logins return Sessions (#8883)
  • 93b6aa2 feat: SearchFlatRuns api call for flat runs table support (#8852)
  • fa43bff ci: test-perf uses determined version from github (#9019)
  • 137bfcd feat: add model streaming (#8973)
  • 8bf280d refactor: consolidate experiment list selection state (#8860)
  • 674cd73 ci: DRY skip logic and clarity on step name (#9002)
  • 00d145f chore: bump version: 0.29.1-dev0 -> 0.29.2-dev0
  • a3ba9e9 chore: add docs dropdown link for new version
  • e922a41 docs: add release notes for 0.29.1 (#9014)
  • dfed63d chore: reassign ml-sys CODEOWNERship to model-dev (#9000)
  • eac7ddf test: document ui e2e with backend test instructions for local (#9005)
  • bc1b431 docs: add EditorRestricted role release note (#9007)
  • f52f43b chore: warn about det deploy det-version mistmach (#8994)
  • 5b17df3 chore: limit code coverage report to files in src; omit generated files (#9003)
  • f73fd09 fix: escape regex in ProjectDeleteModal (#8998)
  • 73fd1cd feat: Add multi RM name to K8s (#8993)
  • 978a02e ci: Djanicek/infraeng 487/circle test runner (#8977)
  • 4730d76 chore: ban http.Transport & http.Client; add cleanhttp (#8991)
  • 52572d4 fix: improved textcell performance for novels (#8986)
  • 89d4708 docs: add EditorRestricted role to rbac docs (#8984)

0.29.1

18 Mar 18:13
Compare
Choose a tag to compare

Release Notes

0.29.1

Changelog

0.29.0

05 Mar 18:48
Compare
Choose a tag to compare

Release Notes

0.29.0

Changelog

  • 5079570 chore: bump version: 0.29.0-rc4 -> 0.29.0
  • 8fa5b5a docs: add release notes for 0.29.0 (#8955)
  • fffde7f chore: bump version: 0.29.0-rc3 -> 0.29.0-rc4
  • f939a0f fix: no data plot in chart with data (#8935)
  • 5a74e37 build: bump ci cpu image to latest ubuntu 2004 (#8940)
  • ad84759 build: bump up ci setup_remote_docker version (#8942)
  • f0d9768 fix: malformed config with gcp up with --initial-user-password (#8936)
  • 2a61ab3 chore: bump version: 0.29.0-rc2 -> 0.29.0-rc3
  • 435e90a chore: fix mp.pool test_streaming_metrics_api (#8917)
  • 18e2ea4 chore: bump version: 0.29.0-rc1 -> 0.29.0-rc2
  • 799373f fix: slurm launcher authenticates preemption notification (#8928)
  • 641174c tc: Add release note 8851 (#8864)
  • f275252 chore: bump up ebs size to 400gb for genai deployments
  • 8c855b7 fix: SSO button link target (#8925)
  • 8d4acd5 tc: Remove broken link (#8924)
  • 06875df fix: canonicalize master urls shim code (#8919)
  • e5ae865 chore: bump version: 0.29.0-rc0 -> 0.29.0-rc1
  • b847ede chore: bump version: 0.29.0-dev0 -> 0.29.0-rc0
  • 28c385c chore: lock published urls to preserve redirects
  • cbfd3c2 chore: lock api state for backward compatibility check
  • b30f609 chore: bump version: 0.28.2-dev0 -> 0.29.0-dev0
  • ad94c17 fix: return error from websocket handler if socket id is taken (#8877)
  • 4618389 style: update genai logo on sidebar (#8907)
  • 8f82087 test: fix tensorboard reattach k8s flake [RM-39] (#8906)
  • d24b19a test: unquarantine deploy-local tests (#8896)
  • 7c6bec9 chore: refactor proto, schema, and jobservice for multiRM (#8875)
  • ca96da1 fix: Genai helm service fix (#8885)
  • a89e51e fix: trial comparison text overflow bug fix (#8869)
  • 9817a4d chore: add trigger to abort checkpoint deletion (#8878)
  • 2689b0b chore: delete unused functions [RM-41] (#8888)
  • 9a6afd2 docs: Organize docs (#8898)
  • a8ac657 chore: small build system fixes (#8900)
  • fa98bf3 fix: add missing ci context to preview cluster
  • b15d508 fix: add deploy last main missing ci context (#8892)
  • b47b477 chore: cleanup stray comments (#8889)
  • ae08265 feat: force default user passwords for all det deploy and CI clusters [RM-28] (#8851)
  • be1ab85 fix: unnecessary group related api calls during the initial group page loading (#8882)
  • f37bc3e fix: move e2e_tests changes for slurm test from EE to OSS (#8887)
  • 93ced86 fix: add missing check for external sessions on exp launch (#8859)
  • 944732a ci: more e2e test fixes (#8881)
  • ab9505c ci: fix e2e tests in ee (#8880)
  • 0bc3106 docs: Add llm blog link to home page (#8874)
  • c029327 docs: add link checker utility (#8738)
  • e1da471 chore: api's default retry now session's default retry (#8872)
  • 7bb9dbc chore: master config updates for multirm [RM-3, RM-4, RM-5, RM-7, RM-29] (#8831)
  • f101f3d chore: add allocation info for cluster ui [DET-10018] (#8616) (#8876)
  • 72d54be chore: canonicalize master urls everywhere [MLG-878] (#8670)
  • e3709bd chore: document internal api errors (#8865)
  • 27a279e fix: e2e CPU tests have wrong maxSlotsPerPod number (#8870)
  • 03b9b30 chore: bunify postgres_jobs.go (#8858)
  • e9ac112 build(deps): bump peter-evans/create-or-update-comment from 3 to 4 (#8760)
  • dc3e41e Fix broken links (#8825)
  • bccdf0c fix: stop allowing multi-container allocations to launch in single agent config (#8833)
  • a1214d7 chore: add allocation info for cluster ui [DET-10018] (#8616)
  • 76ec233 chore: refactor a bunch of auth-related python (#8347)
  • 66b1e6c chore: bump version: 0.28.1-dev -> 0.28.2-dev0
  • f250ad9 chore: add docs dropdown link for new version
  • 9d44ca1 docs: add release notes for 0.28.1 (#8861)
  • ac8c440 fix: allow experiments to configure k8s sidecars (#8854)
  • d07ec40 ci: fix broken ci due to queue version change (#8853)
  • c656aac chore: use npm build for hew (#8845)
  • 6b63750 feat: add a master API to fetch a trial by external id. (#8730)
  • e78a4c0 fix: correctly source bucket region when using minio (#8850)
  • dba5f0f fix: replace react-window with react-virtuoso in transfer component (#8800)
  • 2a183da ci: fix performance feature branch using wrong db (#8835)
  • 47061fa fix: revert config work from #8765 and #8789 due to feature regressions (#8849)
  • a5f38cb chore: remove GetAllocationSummary from RM interface (#8846)
  • de28a57 chore: cover postgres_jobs.go (#8841)
  • ba8250a chore: update backend coverage target (#8798)
  • 556639d fix: show error message from backend API for workspace deletion (#8848)
  • 08dfa43 fix: job queue test failures (#8843)
  • 876f9c3 chore: configure agent log level through config file (#8819)
  • ba03375 chore: move project id onto runs (#8794)

0.28.1

20 Feb 22:58
Compare
Choose a tag to compare

Release Notes

0.28.1

Changelog

  • f6cb624 chore: bump version: 0.28.1-rc3 -> 0.28.1
  • baaa3bd docs: add release notes for 0.28.1 (#8861)
  • fbf9df4 chore: bump version: 0.28.1-rc2 -> 0.28.1-rc3
  • a965f15 ci: fix broken ci due to queue version change (#8853)
  • d91e8b0 chore: bump version: 0.28.1-rc1 -> 0.28.1-rc2
  • 3129d33 fix: revert config work from #8765 and #8789 due to feature regressions (#8849)
  • 1888b90 chore: bump version: 0.28.1-rc0 -> 0.28.1-rc1
  • c443073 fix: show error message from backend API for workspace deletion (#8848)
  • 1fc1496 fix: job queue test failures (#8843)
  • a74685f chore: bump version: 0.28.1-dev -> 0.28.1-rc0
  • 5b2e32d chore: cleanup the last traces of experiment git fields. [MD-258] (#8830)
  • 92a380f feat: Generic task restore (#8802)
  • b0fa7dc feat: generic tasks: support startup hooks (#8840)
  • ca80022 chore: bunify postgres_checkpoints and add tests (#8783)
  • a4dbc03 chore: fix error on terminating experiments on restart (#8837)
  • aa98d82 chore: agent state wasn't getting deleted and logged error (#8838)
  • bb469fa fix: update hew with bugfixes (#8839)
  • 393cfde Fix broken ref (#8836)
  • 7a13863 perf: improve GetExperiments + SearchExperiments counting (#8801)
  • d8d9965 chore: remove unused SetAllocationName (#8829)
  • 1946d9a docs: Update slurm install (#8832)
  • 1fd21e7 fix: Fix small typo in Webhook documentation (#8820)
  • e341e27 feat: Generic Tasks (#8724)
  • fff85e3 fix: handle helm templating in older go template versions (#8828)
  • f300d97 chore: hide genai helm values config and fix var name (#8821)
  • 6206bde feat: add streaming updates core functionality and project streaming (#8669)
  • ed61121 fix: stop truncating log timestamps to avoid missing logs [WEB-1791] (#8815)
  • 43d3f21 fix: check for models before deleting workspace (#8804)
  • bb59fa2 ci: wait longer for performance test db to startup (#8796)
  • cfffe96 docs: Remove legacy pages (#8818)
  • 1c3f3c4 fix: mitigate many unnecessary api calls in user management table (#8816)
  • 4612c41 fix: agent config precedence (#8656)
  • 762fcef feat: Deploy GenAI in Helm (#8727)
  • 8e067d9 fix: remove possible hang from ship_logs.py [MLG-1565] (#8803)
  • 1daf9d3 docs: remove duplicated note (#8813)
  • 56e7000 fix: remove extra quotes around IdentifyTask (#8792)
  • 3805ebd chore: add testing for k8s informer panic (#8810)
  • a35696d refactor: condense trial update functions (#8808)
  • 45c578b chore: bump version: 0.28.0-dev0 -> 0.28.1-dev
  • 6520629 chore: add docs dropdown link for new version
  • ed2136d docs: add release notes for 0.28.0 (#8807)
  • 8258565 chore: bump version: 0.27.2-dev0 -> 0.28.0-dev
  • c5afb6c fix: fetch experiment in case config data is not contained (#8789)
  • 4e17ef7 chore: differentiate between programmatic and web page requests (#8795)
  • a1a6e20 chore: add ee helm chart changes to oss (#8799)
  • 65c811c docs: Add mention of RPMs to on-prem _index.rst (#8773)
  • ad765d4 docs: adds/corrects EE changes, merges to OSS (#8788)
  • f1a45ae perf: update proto_checkpoint_view to use index (#8793)
  • abd590d Revert "docs: Update oidc and saml docs (#8777)" (#8791)
  • bb88b01 fix: improve trial log request cancelling (#8787)
  • 17f305f ci: make perf tests only alert on failure (#8790)
  • 422f5aa perf: avoid loading model def in experiment model (#8742)
  • 7698452 perf: improve GetExperiments showTrialData performance (#8753)
  • e801cfe perf: add index to checkpoints_v2 id (#8758)
  • 71db4e1 perf: add indexes to tasks and allocations (#8757)
  • e0e6cf0 perf: improve get_workspaces query (#8751)
  • ef656bc perf: improve resource agg performance (#8735)
  • e873381 fix: retry watcher failure causes infinite loop (#8786)
  • ba2f190 fix: replace experiment config (#8765)
  • 85d1053 chore: rename postgres_command_intg_test.go (#8785)
  • 40a70cf test: performance test CI work (#8761)
  • 36a2e29 chore: bunify db/postgres_tasks.go (#8764)
  • 07494cf fix: update hew to a version without broken documentcard prompts (#8782)
  • 9502059 feat: GCS client should retry on TooManyRequests. (#8780)
  • ec850ae test: add intg tests for db/postgres_tasks.go (#8750)
  • 9ec2f7d chore: update gke version to comply with latest release for e2e tests (#8781)
  • cefa242 chore: persist checkpoint storage backend ID (#8690)
  • 905e449 chore: migrate db schema trials to runs (#8723)
  • dfbb926 chore: clean up leftover debug print statements (#8755)

0.28.0

06 Feb 20:56
Compare
Choose a tag to compare

Release Notes

0.28.0

Changelog

  • 7f9b082 chore: bump version: 0.28.0-rc4 -> 0.28.0
  • ed1b7f0 docs: add release notes for 0.28.0 (#8807)
  • c4b6f57 chore: bump version: 0.28.0-rc3 -> 0.28.0-rc4
  • 27ce0a2 chore: add ee helm chart changes to oss (#8799)
  • 959a096 chore: add ee helm chart changes to oss (#8799)
  • f513174 chore: bump version: 0.28.0-rc2 -> 0.28.0-rc3
  • 083c314 chore: bump version: 0.28.0-rc1 -> 0.28.0-rc2
  • e272bf0 chore: bump version: 0.28.0-rc0 -> 0.28.0-rc1
  • 6080d39 chore: bump version: 0.27.2-rc4 -> 0.28.0-rc0
  • 3cbce1d chore: bump version: 0.27.2-rc3 -> 0.27.2-rc4
  • 89df98b docs: adds/corrects EE changes, merges to OSS (#8788)
  • 2e27c71 chore: bump version: 0.27.2-rc2 -> 0.27.2-rc3
  • 1abe34f chore: bump version: 0.27.2-rc1 -> 0.27.2-rc2
  • e23e162 fix: improve trial log request cancelling (#8787)
  • 55b5bd4 chore: bump version: 0.27.2-rc0 -> 0.27.2-rc1
  • 5edfd81 fix: retry watcher failure causes infinite loop (#8786)
  • 6a21d44 fix: update hew to a version without broken documentcard prompts (#8782)
  • 74e341d chore: bump version: 0.27.2-dev0 -> 0.27.2-rc0
  • ea9e903 chore: lock published urls to preserve redirects
  • 0321e1f chore: lock api state for backward compatibility check
  • 3783f2b docs: Update oidc and saml docs (#8777)
  • 141afa4 docs: update dependency version in contributing readme (#8776)
  • 994527f fix: Text filter on ProjectMoveModal (#8775)
  • aa65c07 chore: use vite-plugin-svg-to-jsx package (#8772)
  • 98c61f3 test: do not import model_hub test requirements (#8771)
  • 1e2da10 ci: retry git fetch for early stopping checks (#6318)
  • c73712b docs: Replace basic quickstart (#8770)
  • fda515d fix: python requirements for pytest and moto (#8769)
  • 78929c0 fix: Filter value resets when switching column types [WEB-1949] (#8731)
  • 7ddf965 docs: Fix minor issues (#8768)
  • 31f6f99 fix: add default transport to proxy connection (#8767)
  • 149b7fa build(deps): bump slackapi/slack-github-action from 1.24.0 to 1.25.0 (#8766)
  • 719169a docs: Fix dropdown url (#8763)
  • 56406a2 Update helm chart config ref (#8762)
  • 5973f8e chore: bump version: 0.27.1-dev0 -> 0.27.2-dev0
  • 260c2bc chore: add docs dropdown link for new version
  • 4b4d14a docs: add release notes for 0.27.1 (#8746)
  • 7841d9e feat: the new quick start guide link (#8759)
  • 995311a feat: expconf flag to force scheduling on a single node/container/pod (#8743)
  • 64d588f refactor: use hew Tree and Divider components [WEB-1920] (#8736)
  • f771acb fix: cease many model fetch api calls in checkpoint tab (#8749)
  • 96b9064 docs: Add qs for webui users (#8754)
  • d68ffaa docs: API deprecate returning config for bulk endpoints (#8732)
  • f21a516 tests: cover queries inside internal/users/postgres_users.go (#8729)
  • 90a57cb fix: Experiment table, right-click context menu [WEB-1942] (#8756)
  • 2ffc18f chore: import missing EE helm chart change [ci skip] (#8747)
  • 87b6cf3 fix: use the new genai docker repo (#8745)
  • 7c3650f chore: make devcluster to rebuild bindings before harness and webui. (#8748)
  • 7f3ddfb feat: Add a modal to enable/disable Agents [WEB-1718] (#8721)
  • bd0a9ea fix: pagination fix in model detail page (#8744)
  • 43c074e feat: helm option to mount shared_fs checkpoints to master (#8741)
  • 9f06d35 fix: use selected checkpoints when registering (#8739)
  • 6db8c06 test: cover agent_state.go SQL queries (#8740)
  • eb48302 test: cover db.GroupCheckpointUUIDsByExperimentID (#8508)
  • d661404 fix: compress data from API for the page load performance improvement (#8720)
  • b71da7a fix: batch metric writes to TensorBoard [MLG-990] (#8688)
  • bd78ec1 feat: Preserve 'redirect' query during logout [GAS-489] (#8728)
  • 62941a2 refactor: remove antd App component [WEB-1922] (#8713)
  • bf5b1d1 chore: fix unused-imports warning in protos build. (#8726)
  • 0bda0d9 fix: use hew Alert [WEB-1918] (#8711)
  • 1c21f6a chore: Move from internal glide-table-grid to v6.0.0 [WEB-1945] (#8725)
  • f32e015 fix: local checkpoint download path fix (#8722)
  • 190af1d docs: [FE-270] add PBS known issue - Cluster tab does not display GPU information (#8719)
  • 6d744f7 feat: content-length for tar checkpoint downloads (#8684)
  • 11e3ba9 chore: upgrade vitest@1.2.1 (#8718)
  • 92fe3a6 docs: [FE-269] Add documentation detailing configuration steps to set the values for ngpus. (#8714)
  • b69a49c chore: update github path in docker docs (#8687)
  • faea553 chore: codecov reports to match go coverage reports (#8696)
  • 0782c35 chore: standardize oidc/saml group & display attribute names in helm config (#8689)
  • acca434 chore: update oss/ee oidc & saml helm config (#8680)
  • 7188b69 fix: use Hew dropdown on FilterGroup [WEB-1938] (#8715)
  • a410c45 chore: Upgrade to vite 5 (#8676)
  • dbeb458 fix: support CommandState for experiment icon (#8709)
  • 83fe474 docs: fix references on children of "training reference" root (#8708)
  • 71eaa5a chore: Replace antd reset.css with modern-normalize (#8706)
  • e0e08b6 fix: Update hew for chart fix, avoid error from Typography.Label (#8712)
  • fef93a4 build(deps): bump actions/cache from 3 to 4 (#8710)
  • 4aedded docs: Update docs to pass linter (#8705)
  • 00c2746 fix: restore original user store poll on leaving workspace details (#8702)
  • 73760cd Revert "docs: Update docs to pass linter" (#8704)
  • 4b7b705 [docs] Update docs to pass linter (#8703)
  • 2402133 docs: Update Docker Installation Instructions (#8659)
  • f8a2434 docs: Update Linux distros, add WSL, and archs to Quickstart (#8662)
  • 132919f docs: Overhaul WSL deployment instructions (#8658)
  • e7dc7aa chore: Replace custom archived note with Hew badge (#8695)
  • f2899cc fix: fix CreateExperiment for Remote Users (#8700)
  • f00768f chore: remove unused files (#8698)
  • 2e60167 chore: TrialsComparisonModal style fixes [WEB-1919] [WEB-1909] (#8674)
  • e8d6448 Revert "fix: restore original user store poll on leaving workspace details"
  • 6d3f9ff fix: playwright fix (#8699)
  • c869ce7 fix: restore original user store poll on leaving workspace details