Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly merge resource configs #9233

Merged
merged 3 commits into from
Apr 24, 2024
Merged

fix: properly merge resource configs #9233

merged 3 commits into from
Apr 24, 2024

Conversation

eecsliu
Copy link
Contributor

@eecsliu eecsliu commented Apr 23, 2024

Ticket

DET-10245

Description

Task container defaults weren't being merged properly between cluster level and resource pool level. Instead, cluster level TCDs were being completely overwritten. This changes the logic so that it is actually merging.

Test Plan

Test on both kubernetes and agents if possible.

In this example, we'll demonstrate setting environment variables and log policies.
Start by setting a task container default at the cluster level, such as:

        resource_manager:
        ...
        task_container_defaults:
          environment_variables:
            - testCluster=testCluster
          log_policies:
            - action:
                type: exclude_node
              pattern: ECC Error_Cluster_TCD
        ...

Additionally, set the TCD at the resource pool level:

        resource_manager:
        ...
        resource_pools:
          - pool_name: default
            task_container_defaults:
              environment_variables:
                - testRP=testRP
              log_policies:
                - action:
                    type: exclude_node
                  pattern: ECC Error_RP_TCD
        ...

Lastly, set the same values in your experiment config:

environment:
  environment_variables:
    - testExp=testExp
log_policies:
   - pattern: "ECC Error"
     action:
       type: exclude_node

With all three of those set, run devcluster/kubernetes and make sure that in the trial logs you see all of those in the printed config.

Next, comment out each of the three in different combinations to make sure that both the environment variables and the log policies are getting merged correctly. e.g. if there is no log policies in the TCD, the experiment logs should show the cluster level log policies + experiment level log policies.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@eecsliu eecsliu requested review from a team as code owners April 23, 2024 23:28
@cla-bot cla-bot bot added the cla-signed label Apr 23, 2024
Copy link

codecov bot commented Apr 23, 2024

Codecov Report

Attention: Patch coverage is 31.42857% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 44.70%. Comparing base (3b39d3c) to head (6508337).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9233      +/-   ##
==========================================
- Coverage   44.70%   44.70%   -0.01%     
==========================================
  Files        1270     1270              
  Lines      155172   155186      +14     
  Branches     2436     2436              
==========================================
+ Hits        69366    69370       +4     
- Misses      85570    85580      +10     
  Partials      236      236              
Flag Coverage Δ
backend 41.58% <31.42%> (-0.01%) ⬇️
harness 64.31% <ø> (ø)
web 35.22% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ster/internal/rm/agentrm/agent_resource_manager.go 49.11% <50.00%> (-0.22%) ⬇️
master/pkg/model/task_container_defaults.go 84.86% <54.54%> (-2.72%) ⬇️
...nal/rm/kubernetesrm/kubernetes_resource_manager.go 28.38% <0.00%> (-0.10%) ⬇️

Copy link

netlify bot commented Apr 23, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 6508337
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/6628c23e5ae9e00009a7c1fa

@eecsliu eecsliu added the to-cherry-pick Pull requests that need to be cherry-picked into the current release label Apr 23, 2024
Copy link
Contributor

@carolinaecalderon carolinaecalderon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eecsliu eecsliu force-pushed the fix-TCD branch 6 times, most recently from aa7a0ea to 941e313 Compare April 24, 2024 06:45
@eecsliu eecsliu merged commit c18ac83 into main Apr 24, 2024
85 of 98 checks passed
@eecsliu eecsliu deleted the fix-TCD branch April 24, 2024 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed to-cherry-pick Pull requests that need to be cherry-picked into the current release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants