-
Notifications
You must be signed in to change notification settings - Fork 49
CLUO agent should only run on desired nodes #76
Comments
Adding a node label to nodes and having the daemonset match that would be the way I prefer to go. Tainting nodes would affect other pods we don't intend. This probably needs to be implemented by changing the daemonset the operator creates for the moment - however this is a temporary approach as we intend to decouple the operator and daemonset creation. Later, users should just deploy the agent to run on the desired nodes through some means, they won't need to contain logic for this. See #98 or ping directly about those Tectonic discussions. |
Discussing this more with @aaronlevy, this is harder than initially stated. Goal:
It is acceptable for deployment tools (like Tectonic) to label nodes by default (their prerogative). Initial Proposal (new users):Users should simply add a node selector on the Problems for Tectonic and those who used manage-agent=trueTectonic 1.6 used
Using Existing Labels:
Proposal AddendumSupport blacklisting for backward compatibility:
|
To summarize some points and questions from a meeting on this: Adding labels to existing clustersThere actually is a somewhat sane migration path if we let the operator infer the operating system version bsed on the At a minimum, it was proposed that the next version of the operator add a label, based on this information, to all Container Linux nodes. It was proposed that the operator, if it manages the agent ( What labels do we add/select on?This is where things are a little more murky; it's unclear the exact form these labels should take. Beware, here be bikesheds. The options discussed are:
ProposalIt's difficult to provide the full set of options for the complete picture of what we do, so I'm going to write what I propose and ask that @aaronlevy and @dghubble mention their preferences too. Note that my preference has moved to basically be: "Let's do the bare minimum to allow 1) CLUO to not run on other distros and 2) allow a fairly sane hatch to edit the agent's daemonset selector". We can always figure out more complicated things, such as knobs the user can actually turn in a supported way, in the future. I propose we do the following:
In the future, the kubelet should apply a generically named os-ID node label. Migrating the daemonset between these labels in the future is trivial (even if we do 3. it's not hard) Problems not addressed:
|
Nice notes @euank. My aim is similar - that we find the bare minimum necessary to allow those using To reiterate, new users in open-source can just set From the options, I prefer 1 or 2 since they're more minimal. I don't like 3 because it introduces a new "class" kinda design we'd write code for, which would then only be used in the deprecated code flow. I wanna focus on getting users from I mostly like the proposal. Its valuable to add the OS id as a label (or later kubelet adds it officially) independent of this change so I agree. I would prefer the operator, if
I prefer the application-specific label because its scoped and versioned to CLUO, independent of node identity, it provides a migration path to the agent using node selectors, and its less code upkeep. Open source users could, if they choose, use a similar label scheme as its simple to explain to fellow cluster admins. Finally, it still leaves the possibility of adding a I still see no way we could ever delete the |
The agent already adds an application-specific id based on the host's
Is there additional information we require from |
@dghubble It's a chicken-egg problem if we rely on the agent to self-label nodes it can run on (before it can run on them). The suggestion is that the operator, or ideally in the long term the kubelet, do it in order to dodge that problem. |
Yeah, you're right, we shouldn't rely on those labels added by the agent. Even though this code path is for people still using |
With respect to code changes in CLUO, this issue has been addressed. Direct users of CLUO should switch from
Tectonic clusters will get this behavior in in September. Container Linux nodes will automatically be labeled with cc @robszumski |
The CLUO agent should not run on anything other than Container Linux nodes. There are a few ways to implement this:
Note that we set
container-linux-update.v1.coreos.com=coreos
on CL nodes, but I assume this comes from the CLUO, so it can't schedule based on that.Once we figure out a strategy, let's file an issue on the Tectonic installer.
The text was updated successfully, but these errors were encountered: