Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: agent config precedence #8656

Merged
merged 3 commits into from
Feb 8, 2024
Merged

Conversation

amandavialva01
Copy link
Contributor

@amandavialva01 amandavialva01 commented Jan 8, 2024

Description

This PR makes determined-agent initialization and start-up more similar to determined-master by incorporating viper into agent configuration setup to ensure the correct priority of agent configuration options (flag > environment > config > default).

It is also worth noting that the the determined-agent flag enable_api was changed to api_enabled.

Test Plan

  • First, cd into local determined repo and spin up a cluster with the devcluster tool. In the cluster logs in your terminal tab, ensure that the agent configuration outputs shows "agent_reconnect_attempts":5 (provided that the agent_reconnect_attempts field is not defined in the agent.config section of your devcluster YAML, the JSON output should show the default value, which is currently 5).
  • Now, add agent_reconnect_attempts: 9 to your devcluster config as follows:
config_file:
   ...
    agent: 
         ...
         agent_reconnect_attempts: 9
  • Spin up your devcluster using this updated config, and verify that the agent configuration outputs shows "agent_reconnect_attempts":9 in your logs.
  • Now, press 2 in your terminal to kill the agent binary so that you are only running DB and MASTER
  • Open up a new terminal tab and run export DET_AGENT_RECONNECT_ATTEMPTS=10
  • Then, cd into determined/agent/build and run ./determined-agent --config-file /private/tmp/devcluster/agent.conf
  • verify that the agent configuration outputs shows "agent_reconnect_attempts":10 (rather than the current default value of 5 or config value of 9)
  • Now, to ensure that flags take precedent over defaults and environment variables, run ./determined-agent --config-file /private/tmp/devcluster/agent.conf --agent-reconnect-attempts 12
  • ensure that the agent configuration outputs shows "agent_reconnect_attempts":12 rather than the default value 5, config value 9, or the environment variable value 10.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

DET-9800

@amandavialva01 amandavialva01 requested a review from a team as a code owner January 8, 2024 17:25
@cla-bot cla-bot bot added the cla-signed label Jan 8, 2024
Copy link

netlify bot commented Jan 8, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit a9e2630
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/65c501587e62b700083d9fe6

@amandavialva01 amandavialva01 force-pushed the amanda/fixAgentConfigPrecedence branch 7 times, most recently from ba2f357 to 05af106 Compare January 10, 2024 22:22
Copy link
Contributor

@maxrussell maxrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Thanks for taking this on. Couple things worth bringing up, but seems very close.

Comment on lines 27 to 38
ConfigFile: "",
Log: logger.Config{
Level: "trace",
Color: true,
},
MasterHost: "",
MasterPort: 0,
AgentID: "",
Label: "",
ResourcePool: "",
ContainerMasterHost: "",
ContainerMasterPort: 0,
SlotType: "auto",
VisibleGPUs: VisibleGPUsFromEnvironment(),
Security: SecurityOptions{
TLS: TLSOptions{
Enabled: false,
SkipVerify: false,
MasterCert: "",
MasterCertName: "",
},
},
Debug: false,
ArtificialSlots: 0,
ImageRoot: "",
TLS: false,
TLSCertFile: "",
TLSKeyFile: "",
APIEnabled: false,
BindIP: "0.0.0.0",
BindPort: 9090,
HTTPProxy: "",
HTTPSProxy: "",
FTPProxy: "",
NoProxy: "",
AgentReconnectAttempts: aproto.AgentReconnectAttempts,
AgentReconnectBackoff: int(aproto.AgentReconnectBackoff / time.Second),
ContainerRuntime: DockerContainerRuntime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to set fields to the "zero value" of their type, like "", false, or 0. Or is this an exhaustruct linting thing?

Suggested change
ConfigFile: "",
Log: logger.Config{
Level: "trace",
Color: true,
},
MasterHost: "",
MasterPort: 0,
AgentID: "",
Label: "",
ResourcePool: "",
ContainerMasterHost: "",
ContainerMasterPort: 0,
SlotType: "auto",
VisibleGPUs: VisibleGPUsFromEnvironment(),
Security: SecurityOptions{
TLS: TLSOptions{
Enabled: false,
SkipVerify: false,
MasterCert: "",
MasterCertName: "",
},
},
Debug: false,
ArtificialSlots: 0,
ImageRoot: "",
TLS: false,
TLSCertFile: "",
TLSKeyFile: "",
APIEnabled: false,
BindIP: "0.0.0.0",
BindPort: 9090,
HTTPProxy: "",
HTTPSProxy: "",
FTPProxy: "",
NoProxy: "",
AgentReconnectAttempts: aproto.AgentReconnectAttempts,
AgentReconnectBackoff: int(aproto.AgentReconnectBackoff / time.Second),
ContainerRuntime: DockerContainerRuntime,
Log: logger.Config{
Level: "trace",
Color: true,
},
SlotType: "auto",
VisibleGPUs: VisibleGPUsFromEnvironment(),
BindIP: "0.0.0.0",
BindPort: 9090,
AgentReconnectAttempts: aproto.AgentReconnectAttempts,
AgentReconnectBackoff: int(aproto.AgentReconnectBackoff / time.Second),
ContainerRuntime: DockerContainerRuntime,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly just hardcoded the default values for clarity, but changed this since it's a lot of redundant code.

Comment on lines 182 to 183
// ResolveHostname resolves the name (or ID) of the agent.
func (o *Options) ResolveHostname() (*Options, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally have a preference for immutability as a default, so when a function/method is going to mutate things, I like to do my best to call that out in the function's name and doc comment.

What do you think about this for this function?

// SetAgentID resolves the name of name (or id) of the agent and saves it to o.
func (o *Options) SetAgentID() error {

then calling code can change to

err = opt.SetAgentID()
if err != nil {

rather than

opt, err = opt.ResolveHostname()
if err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, changed!

Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: 41 lines in your changes are missing coverage. Please review.

Comparison is base (c5afb6c) 47.72% compared to head (a9e2630) 47.78%.
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8656      +/-   ##
==========================================
+ Coverage   47.72%   47.78%   +0.05%     
==========================================
  Files        1049     1050       +1     
  Lines      167292   167353      +61     
  Branches     2243     2241       -2     
==========================================
+ Hits        79841    79964     +123     
+ Misses      87293    87231      -62     
  Partials      158      158              
Flag Coverage Δ
backend 43.57% <73.02%> (+0.27%) ⬆️
harness 64.32% <ø> (ø)
web 42.52% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
agent/cmd/determined-agent/init.go 100.00% <100.00%> (ø)
agent/cmd/determined-agent/root.go 100.00% <100.00%> (+100.00%) ⬆️
agent/cmd/determined-agent/main.go 0.00% <0.00%> (ø)
agent/internal/options/options.go 29.09% <66.66%> (ø)
agent/cmd/determined-agent/run.go 29.09% <24.39%> (+25.20%) ⬆️

... and 7 files with indirect coverage changes

@amandavialva01 amandavialva01 enabled auto-merge (squash) February 8, 2024 16:29
@amandavialva01 amandavialva01 merged commit 4612c41 into main Feb 8, 2024
68 of 84 checks passed
@amandavialva01 amandavialva01 deleted the amanda/fixAgentConfigPrecedence branch February 8, 2024 16:43
maxrussell pushed a commit that referenced this pull request Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants