Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disabling initramfs networking via Ignition #979

Closed
stbenjam opened this issue May 14, 2020 · 65 comments
Closed

Support disabling initramfs networking via Ignition #979

stbenjam opened this issue May 14, 2020 · 65 comments

Comments

@stbenjam
Copy link

stbenjam commented May 14, 2020

Feature Request

Support running part of ignition before networking comes up

Environment

RHCOS

What hardware/cloud provider/hypervisor is being used to run Ignition?

Baremetal

Desired Feature

For baremetal IPI platform in OpenShift, we load ignition from disk. We would like to include network configuration that happens before NetworkManager starts. For example, we want to disable dhcp on some interfaces. If we can't do that, then on some hosts that could have upwards of 10 (!) interfaces, booting takes an extraordinary long time.

Also, when an ignition contains an 'append' with a network source, it's conceivable we need to do some pre-configuration to make that network endpoint accessible (for example, we need to configure a VLAN on an interface).

Other Information

It's almost like we need a pre-ignition tool (preflight?) analogous to what afterburn is doing post-ignition.

@stbenjam
Copy link
Author

@cgwalters What do you think of something like this?

@hardys
Copy link

hardys commented May 14, 2020

To link this to a real (OpenShift) use-case, see https://bugzilla.redhat.com/show_bug.cgi?id=1824331 - in some baremetal environments the external network can be on a tagged vlan, so you have to do some configuration via ignition before the append (in v2.x) URL can be evaluated. I guess a similar issue exists with the merge feature in v3.

@cgwalters
Copy link
Member

A lot of overlap here with openshift/enhancements#291

@cgwalters
Copy link
Member

@stbenjam
Copy link
Author

A lot of overlap here with openshift/enhancements#291

I did read through that but it's very tied to solving the problem by embedding an ignition in an ISO, rather than letting ignition do some things before and after networkmanger is up.

@stbenjam
Copy link
Author

See also coreos/ignition-dracut#94

Yea that's a similar problem, but we are using https://coreos.com/os/docs/latest/config-drive.html instead, so I'm not sure this issue helps us

@cgwalters
Copy link
Member

What I'd argued in one of the epic number of tickets around this is that I think all we need to do for all the major clouds that use the link-local address is bring up networking enough to do that, then we could pull network configuration out of the Ignition itself - then we can do static networking in clouds too (not that most people want this).

That said, if we fixed openshift/machine-config-operator#1690 then you guys could "flatten" the config into /boot/config.ign and then you wouldn't need networking in the initramfs at all, right? I go back and forth on whether or not this is a good idea; it takes away some control from the MCO.

rather than letting ignition do some things before and after networkmanger is up.

But we'd need to flesh out more what that would look like. Something like:

"initramfs-network": {
    "dhcp": false,
    "args": "ip=192.xxx...."

Or just support writing arbitrary things into /etc/NetworkManager before it starts, basically a filesystem bit that applies to the initramfs?

@stbenjam
Copy link
Author

stbenjam commented May 14, 2020

That said, if we fixed openshift/machine-config-operator#1690 then you guys could "flatten" the config into /boot/config.ign and then you wouldn't need networking in the initramfs at all, right? I go back and forth on whether or not this is a good idea; it takes away some control from the MCO.

That addresses the immediate problem for us, but not the general use. It makes complete sense to use network data sources for append/merge in ignition, but there's always going to be a circular dependency on environments that have networks that need configuration.

rather than letting ignition do some things before and after networkmanger is up.
But we'd need to flesh out more what that would look like. Something like:
"initramfs-network": {
"dhcp": false,
"args": "ip=192.xxx...."
Or just support writing arbitrary things into /etc/NetworkManager before it > starts, basically a filesystem bit that applies to the initramfs?

Maybe? I don't understand entirely how the initramfs dracut/ignition stuff interact -- is the /etc/NetworkManager at that stage in the ramdisk or is that the host's eventual configuration?

@cgwalters
Copy link
Member

We may need a meeting on this but I really want to walk though how the Live ISO changes the game for you guys. Hmm....I think we maybe do need to support some flag/mechanism for the Live ISO to disable networking in the initramfs even if Ignition is injected.

Maybe? I don't understand entirely how the initramfs dracut/ignition stuff interact -- is the /etc/NetworkManager at that stage in the ramdisk or is that the host's eventual configuration?

We use NM in the initrd too (RHCOS will after 8.2). There's no required linkage between the initramfs configuration and the real root; most people will want them to be the same (see coreos/ignition-dracut#89 ) but like that PR we can add an option to avoid propagating them to the real root.

@jlebon
Copy link
Member

jlebon commented May 15, 2020

Related: #956 (which would also be smart enough to handle live ISO with embedded Ignition configs). It's also an on-ramp for supporting fetching Ignition configs over link-local addresses.

@jlebon
Copy link
Member

jlebon commented May 15, 2020

I'm very hesitant to allow Ignition to configure the initramfs itself. I'd prefer we stay away from that for as long as we can because it essentially introduces the same problem for the initrd which Ignition was meant to solve for the real root.

It makes complete sense to use network data sources for append/merge in ignition, but there's always going to be a circular dependency on environments that have networks that need configuration.

We've been working hard on improving the situation there. :) In addition to the links above, see also coreos/fedora-coreos-tracker#460. I agree with Colin that we should push further on the live ISO embedded config path for bare metal, and investigate the link-local approach for clouds. (I'm also not sure how Ignition can correctly implement the RFE here if e.g. you configure networking in the base config, but then one of the merge configs actually wants to override what the config file should have been.)

@stbenjam
Copy link
Author

I'm very hesitant to allow Ignition to configure the initramfs itself. I'd prefer we stay away from that for as long as we can because it essentially introduces the same problem for the initrd which Ignition was meant to solve for the real root.

Would it be kernel command line arguments then for this configuration? I wasn't sure if that worked in rhcos, since we tried it when we were trying to get IPv6 working. Can you do VLAN configuration with those arguments?

It makes complete sense to use network data sources for append/merge in ignition, but there's always going to be a circular dependency on environments that have networks that need configuration.
We've been working hard on improving the situation there. :) In addition to the links above, see also coreos/fedora-coreos-tracker#460. I agree with Colin that we should push further on the live ISO embedded config path for bare metal, and investigate the link-local approach for clouds.

The ISO approach for bare metal isn't going to work in the IPI case, and that seems to be ignored in the present designs. We do exist and have customers. We have some initial discussions about how we can use the live ISO for IPI but it's going to be later than we need this problem solved.

I also don't see how the live ISO case fits into OpenShift, because the installer generates a stub ignition with network dependencies and you're going to run into customers with these complex network requirements. I don't see where that use case is being addressed.

(I'm also not sure how Ignition can correctly implement the RFE here if e.g. you configure networking in the base config, but then one of the merge configs actually wants to override what the config file should have been.)

You could have something like afterburn to be able to do preflight configuration before ignition runs.

OpenStack config drives have network metadata for similar reasons: https://docs.openstack.org/nova/latest/user/metadata.html.

@cgwalters
Copy link
Member

I'm very hesitant to allow Ignition to configure the initramfs itself. I'd prefer we stay away from that for as long as we can because it essentially introduces the same problem for the initrd which Ignition was meant to solve for the real root.

If it's just constrained to networking though, that seems not too bad.

And actually perhaps to start, we could offer exactly one thing - the ability for Ignition to disable initramfs networking. That would allow using coreos-installer iso embed to be fully programmable - the system boots live, the live Ignition can bring up the network however it wants (or not), and run coreos-installer however it wants, providing that Ignition config however it wants, etc.

In particular once we have osmet, I suspect a lot of cases could get away with not even bringing up the network at all in the live OS.

@cgwalters cgwalters changed the title Support running part of ignition before networking comes up Support disabling initramfs networking via Ignition May 21, 2020
@cgwalters
Copy link
Member

Yeah I am increasingly certain that we missed this important case of "Ignition provided, but does not require networking in the initramfs" in all of the recent work following coreos/fedora-coreos-tracker#443

I took a look at the recent libvirt testing scripts and it looks like to me theyl use the default bridged network which still has DHCP enabled. So what happens here is when we're doing a static config in those tests...we still used (and required!) DHCP in the initramfs.

That's not going to work for environments without DHCP at all, or even just ones where we don't want to DHCP on each NIC but not all of them etc.

openshift/enhancements#291 is not complete IMO until we fix this.

Oh right actually, this is exactly what this issue is: coreos/coreos-installer#164

@cgwalters
Copy link
Member

cgwalters commented May 21, 2020

OK so yep, going back we have two "not so complex" options:

  1. coreos-install iso embed --no-initramfs-networking: coreos-install iso embed --no-initramfs-networking coreos-installer#164 (comment) (only applies to live ISO)
  2. Enhance Ignition to have a way for the config to tell us not to use the network at all in the initramfs (this issue)

Or, the much bigger options of:

  1. Enhance Ignition to have a way to configure initramfs networking
  2. Enhance Ignition to do conditional networking

Another approach is:

  1. Don't enable networking in the initramfs in the Live ISO at all (but that seems very confusing and breaks the symmetry)

@dustymabe
Copy link
Member

Yeah I am increasingly certain that we missed this important case of "Ignition provided, but does not require networking in the initramfs" in all of the recent work following coreos/fedora-coreos-tracker#443

A follow-on that we still want to do is coreos/fedora-coreos-tracker#460

I took a look at the recent libvirt testing scripts and it looks like to me theyl use the default bridged network which still has DHCP enabled. So what happens here is when we're doing a static config in those tests...we still used (and required!) DHCP in the initramfs.

Can you elaborate what you mean. That test script tests many different configurations including dhcp and static networking in the initramfs combined with static networking in the real root. I do use the default libvirt bridged network, but that's only to cover the dhcp test cases and the machine won't attempt DHCP if you tell it not to.

That's not going to work for environments without DHCP at all, or even just ones where we don't want to DHCP on each NIC but not all of them etc.

The EPIC work that I've done specifically focuses on static networking via the installer in the Live ISO. In that case you can easily configure static networking when doing the install.

Follow on work to this is to not bring up networking in the initramfs if not needed at all: coreos/fedora-coreos-tracker#460, but that was not part of the EPIC.

openshift/enhancements#291 is not complete IMO until we fix this.

Oh right actually, this is exactly what this issue is: coreos/coreos-installer#164

I think the better approach is
coreos/fedora-coreos-tracker#460 . For now (RHCOS in 4.6) you can achieve 'no networking in the initramfs' like:

  • provide --firstboot-args='' with kargs that disables networking in the initramfs and provide networking config in the real root.

but I guess the real question is why do you need "no networking in the initramfs". I think the real desire is "I only want the networking I want in the initramfs". In that case just configure the networking you want via the installer right?

@jlebon
Copy link
Member

jlebon commented May 21, 2020

Heh, was typing up a comment but @dustymabe beat me to it. :)

To summarize: In the general case, I think this is resolved by coreos/fedora-coreos-tracker#460 (which provides a framework for apps to tell the OS that networking is needed) and #956 + coreos/ignition-dracut#164 (which makes use of that framework). In the live ISO case/metal case of "complex networking on first boot", one can configure networking as required at install time and use --copy-network.

@dustymabe
Copy link
Member

dustymabe commented May 21, 2020

@jlebon yes. Maybe we should all get together and write up a proper EPIC so the requirements are collected and we can make sure coreos/fedora-coreos-tracker#460 addresses all needs.

@cgwalters
Copy link
Member

provide --firstboot-args='' with kargs that disables networking in the initramfs and provide networking config in the real root.

The user story here is "I have a CoreOS ISO and coreos-installer iso embed" and I want to automate installs. --firstboot-args is something that runs in the booted OS. I want to be able to e.g. run a script before or after coreos-install without having ever done DHCP at all.

@cgwalters
Copy link
Member

To summarize: In the general case, I think this is resolved by coreos/fedora-coreos-tracker#460 (which provides a framework for apps to tell the OS that networking is needed) and #956 + coreos/ignition-dracut#164 (which makes use of that framework).

Right, I added that as a second option in the "more complex" options above.

@dustymabe
Copy link
Member

dustymabe commented May 21, 2020

The user story here is "I have a CoreOS ISO and coreos-installer iso embed" and I want to automate installs. --firstboot-args is something that runs in the booted OS. I want to be able to e.g. run a script before or after coreos-install without having ever done DHCP at all.

Yep. We did have coreos/fedora-coreos-config#326 to not require networking for the ISO but we explicitly do bring up networking in the case of an embedded ignition config (as requested during feature discussions) and pointing to future work to take care of that case: https://github.com/coreos/fedora-coreos-config/blob/5615f58487f9fd9f1ebc19e0e6416b53a3e6270f/overlay.d/05core/usr/lib/dracut/modules.d/20live/coreos-liveiso-network-kargs.service#L5-L15

We could easily remove the ConditionPathExists=|/config.ign

@cgwalters
Copy link
Member

and pointing to future work to take care of that case:

"but the user can override with rd.neednet=0 now if needed."

No, there's no ergonomic way to override the live ISO kernel arguments in an automated fashion.

We could easily remove the ConditionPathExists=|/config.ign

That's an option I guess...I added it to the list above.

@jlebon
Copy link
Member

jlebon commented May 21, 2020

I want to be able to e.g. run a script before or after coreos-install without having ever done DHCP at all.

Right yeah, this is also in scope for coreos/fedora-coreos-tracker#460. Right now we took the shortcut that embedded config = requires networking. But the end goal is absolutely for Ignition to only ask for networking if required (via the exact same mechanism that is used in clouds/freshly installed metal).

One gap then is if one needs complex networking on first boot of a non-metal platform, and that's what Afterburn is trying to address (by providing a channel for configuring networking, see e.g. coreos/afterburn#404).

Of course, another major gap for much of this is to have osmet supported on RHCOS so that install-time can truly be fully offline. We should chat about our approach on that! (And I see now you recently opened coreos/coreos-assembler#1467 related to this).

@stbenjam
Copy link
Author

stbenjam commented May 21, 2020

This discussion seems to be going off the rails a bit from our original feature request. I'm not sure why the live CD is part of this particular discussion, we're not using it -- maybe we should have a live discussion about what ignition is missing for us? We need to have something in the 4.6 time frame.

OpenShift installer creates an ignition that needs networking. Any ignition has the possibility to include merge/appends that reference network URI's -- what is the answer when a user requires some configuration to make networking work to access those resources?

It's impossible today in the baremetal IPI platform to support use cases where the network used to access the MCS requires a VLAN, for example.

@dustymabe
Copy link
Member

From #979 (comment)

Another approach is:

Don't enable networking in the initramfs in the Live ISO at all (but that seems very confusing and breaks the symmetry)

Networking would still be enabled in the other conditions. Just not for the embedded ignition config case.

@jlebon
Copy link
Member

jlebon commented May 21, 2020

maybe we should have a live discussion about what ignition is missing for us?

👍 Yeah, that'd be great! :)

The reason we're talking about the live ISO is because it will break the dependency loop between Ignition and networking by being able to provide networking configuration upfront at install time.

@ashcrow
Copy link
Member

ashcrow commented May 21, 2020

maybe we should have a live discussion about what ignition is missing for us?

+1 Yeah, that'd be great! :)

Agreed. The result of this spike should find it's way into a card for execution as well once we've figured out what should be done.

@cgwalters
Copy link
Member

This discussion seems to be going off the rails a bit from our original feature request. I'm not sure why the live CD is part of this particular discussion, we're not using it -- maybe we should have a live discussion about what ignition is missing for us?

Yes, I understand. If we want to handle your case of using the -openstack image (right?) then we can't use option 1.

OpenShift installer creates an ignition that needs networking. Any ignition has the possibility to include merge/appends that reference network URI's -- what is the answer when a user requires some configuration to make networking work to access those resources?

OK I thought about this more and you're right, we either need to:

  1. Make it easy to get a "flattened" real config Rendering ignition config when access to MCS is not available openshift/machine-config-operator#1690
  2. Support configuring the initramfs network via Ignition

Of these...I lean more towards the first, but it's hard. The reasons (AFAIK) openshift-install always generates a pointer configuration is twofold:

  • In e.g. AWS, there are size limitations on user data, but I suspect you don't have that problem
  • The actual Ignition lives inside the MCO codebase and the installer binary just doesn't have it "offline". Solving that would be...hard and messy; the MCO (most notably the MCS) is just designed to run as part of a cluster. Really we need something like a bootstrap node to extract that configuration.

Anyways, once we have a flattened config, then as long as we have a way to just disable the initramfs networking requirement, we only have to concern ourselves with networking in the real root.

jlebon added a commit to jlebon/ignition-dracut that referenced this issue Jun 17, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to jlebon/ignition-dracut that referenced this issue Jun 18, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to jlebon/ignition-dracut that referenced this issue Jun 18, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
jlebon added a commit to coreos/ignition-dracut that referenced this issue Jun 18, 2020
Make use of the new `fetch-offline` stage:

coreos/ignition#979

We run this between the `setup` and `fetch` stages (the latter possibly
being skipped if networking is not required).

We hit the same issue here that `coreos-copy-firstboot-network.service`
hit, which is that we can't run before the `cmdline` hook because that
runs *before* udev, but we want the `by-*` symlinks for
`ignition-setup-user.service`.

The hack we do here is to rerun the NM cmdline hook in case ignition
dropped a snippet in `/etc/cmdline.d`. As mentioned in
coreos/fedora-coreos-config#346, we'll be able
to do this more cleanly once we run NM as a systemd service directly.
@hardys
Copy link

hardys commented Jun 19, 2020

See coreos/coreos-installer#212 and coreos/fedora-coreos-config#346. IOW, at install-time, you pass --copy-network to coreos-installer install and that will inject the NM keyfiles into /boot. Those keyfiles are then copied into the initramfs at first boot time before Ignition needs to fetch anything.

Thanks @jlebon - this is the detail I was missing :)

I wonder if we could consider making the 15copy-installer-network script copy from a config-drive partition if it exists? One of the problems when the ironic deploy ramdisk is used (as is currently the case for IPI baremetal) it can write a config-drive partition, but there's no support for modifying the OS disk image itself (so we can't drop files in /boot).

Long term we can also look at how we might consume the rhcos initrd with coreos-install, but there are some feature gaps and interfaces differences which will need further investigation if we want that to work with the ironic services we currently use.

@jlebon
Copy link
Member

jlebon commented Jun 19, 2020

I wonder if we could consider making the 15copy-installer-network script copy from a config-drive partition if it exists?

Hmm... maybe. (Though the fact that it's optional makes it similar to #928.) Let's see if there's a better way first.

One of the problems when the ironic deploy ramdisk is used (as is currently the case for IPI baremetal) it can write a config-drive partition, but there's no support for modifying the OS disk image itself (so we can't drop files in /boot).

Do you have a link to documentation/code re. what the deployer is capable of? I presume you're not able to run arbitrary code?

Long term we can also look at how we might consume the rhcos initrd with coreos-install, but there are some feature gaps and interfaces differences which will need further investigation if we want that to work with the ironic services we currently use.

Yes, agreed! 👍

@dustymabe
Copy link
Member

Long term we can also look at how we might consume the rhcos initrd with coreos-install, but there are some feature gaps and interfaces differences which will need further investigation if we want that to work with the ironic services we currently use.

Long term I think there are a few options with varying levels of difficulty so we might progressively go through them:

  1. teach the deployer how to mount /boot from the disk that was recently imaged and modify contents
    • This is the easiest to implement, but not ideal as the deployer is now responsible for knowing internal details of how we are moving things around in RHCOS/FCOS land.
  2. include the coreos-installer binary in the deployer and have the deployer call coreos-installer with specified options to image the disk
  3. have the deployer actually be the RHCOS/FCOS live PXE/ISO environment in the future.

@hardys
Copy link

hardys commented Jun 22, 2020

One of the problems when the ironic deploy ramdisk is used (as is currently the case for IPI baremetal) it can write a config-drive partition, but there's no support for modifying the OS disk image itself (so we can't drop files in /boot).

Do you have a link to documentation/code re. what the deployer is capable of? I presume you're not able to run arbitrary code?

There is an overview here https://docs.openstack.org/ironic/latest/user/index.html#understanding-bare-metal-deployment - the agent itself is https://github.com/openstack/ironic-python-agent which is contained in a ramdisk we use to write the OCP disk image (similar to the OCP ramdisk with coreos-install).

Basically there are no interfaces for running arbitrary code, or injecting arbitrary files, the baremetal box is treated like a cloud VM - you provide an OS disk image, and optionally some user-data/meta-data (which in the metal3 case happens via a config-drive partition)

OpenStack has support for a network_data.json ref https://docs.openstack.org/nova/latest/user/metadata.html#openstack-format-metadata but I don't think the ignition openstack backend does anything with that atm?

@hardys
Copy link

hardys commented Jun 22, 2020

Long term we can also look at how we might consume the rhcos initrd with coreos-install, but there are some feature gaps and interfaces differences which will need further investigation if we want that to work with the ironic services we currently use.

Long term I think there are a few options with varying levels of difficulty so we might progressively go through them:

1. teach the deployer how to mount /boot from the disk that was recently imaged and modify contents
   
   * This is the easiest to implement, but not ideal as the deployer is now responsible for knowing internal details of how we are moving things around in RHCOS/FCOS land.

Yeah I suspect that won't be acceptable to upstream Ironic folks as a change to ironic-python-agent, since it's designed to be completely OS agnostic /cc @juliakreger @dtantsur

2. include the `coreos-installer` binary in the deployer and have the deployer call `coreos-installer` with specified options to image the disk

3. have the deployer actually be the RHCOS/FCOS live PXE/ISO environment in the future.
   
   * an ignition config can be provided to check in with services and such (see https://dustymabe.com/2020/04/04/automating-a-custom-install-of-fedora-coreos/)

I suspect this is where we want to get to, the main challenge is right now we depend on some features of ironic-python-agent (in particular introspection), so the RHCOS/FCOS ramdisk isn't a drop-in replacement.

Switching to an RHCOS/FCOS based solution would either require reimplementing some of the IPA features (and interfaces to interact with ironic, but that could be a shim injected like in your blog post I guess), or maybe have a way to run IPA on top of the RHCOS/FCOS image (which doesn't solve the inject-files-to-boot problem but it has been done before ref https://docs.openstack.org/ironic-python-agent/queens/install/index.html#coreos)

@dtantsur
Copy link

the deployer is now responsible for knowing internal details of how we are moving things around in RHCOS/FCOS land.

Yeah, it would practically require a downstream addition to the ramdisk. I'd like to note that ironic-python-agent is designed to be pretty flexible, so it's not an entirely bad thing either. We're working on a thing called "deploy steps" specifically to allow site-specific customizations.

it has been done before

I think CoreOS was so different at the time that it won't help us (and we did some ugly things like chroot in tarball contents).

@jlebon
Copy link
Member

jlebon commented Jun 22, 2020

the baremetal box is treated like a cloud VM - you provide an OS disk image, and optionally some user-data/meta-data (which in the metal3 case happens via a config-drive partition)

OK right, that makes a lot of sense now.

I suspect this is where we want to get to, the main challenge is right now we depend on some features of ironic-python-agent (in particular introspection), so the RHCOS/FCOS ramdisk isn't a drop-in replacement.

So to recap, I think we're agreed on where we want to go long-term. And the question now is how to get this working in the short-term with the existing constraints of the deployer. @hardys Do I understand correctly that the "edit image" hack should work for now?

If we can do the flattened Ignition config thing, this could also be solved by coreos/fedora-coreos-config#426.

@hardys
Copy link

hardys commented Jul 21, 2020

So to recap, I think we're agreed on where we want to go long-term. And the question now is how to get this working in the short-term with the existing constraints of the deployer. @hardys Do I understand correctly that the "edit image" hack should work for now?

If we can do the flattened Ignition config thing, this could also be solved by coreos/fedora-coreos-config#426.

Sorry for the delay responding @jlebon yes I think that's accurate, so to summarize:

@stbenjam @kirankt does that sound reasonable?

@dtantsur
Copy link

Short term workaround is to edit the RHCOS image (we'll need some docs/kbase downstream for this)

In recent ironic (~ a week ago) you can drop a python plugin to IPA that will be called automatically during deployment to do.. pretty much whatever you want. We can have a downstream plugin that does $stuff for us as a short-term workaround.

@hardys
Copy link

hardys commented Jul 21, 2020

Short term workaround is to edit the RHCOS image (we'll need some docs/kbase downstream for this)

In recent ironic (~ a week ago) you can drop a python plugin to IPA that will be called automatically during deployment to do.. pretty much whatever you want. We can have a downstream plugin that does $stuff for us as a short-term workaround.

@dtantsur sounds interesting - can you link any docs/examples? Could we modify /boot (sed the grub.cfg and/or drop some file in there) after IPA writes the OS image? What would the interface from ironic look like?

@dtantsur
Copy link

dtantsur commented Jul 22, 2020

sounds interesting - can you link any docs/examples?

It's a brand new feature, I'm going to work on examples this/next week.

Could we modify /boot (sed the grub.cfg and/or drop some file in there) after IPA writes the OS image?

Yes! But do you really need it to happen afterwards or do you need to modify /etc/defaults/grub before grub installation? I think the latter is a bit more robust.

What would the interface from ironic look like?

There are options. If we create an openshift-specific plugin, it may have no interface at all, just silently do what we need. If you could remind me what exactly we need to do (I'm a bit lost in the history of this issue), I can say specifically.

@jlebon
Copy link
Member

jlebon commented Jul 22, 2020

Could we modify /boot (sed the grub.cfg and/or drop some file in there) after IPA writes the OS image?

Yes! But do you really need it to happen afterwards or do you need to modify /etc/defaults/grub before grub installation? I think the latter is a bit more robust.

Note FCOS and RHCOS don't use grub2-mkconfig, but pure BLS: https://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/. So the canonical kargs live in the BLS configs at $(boot)/loader/entries/ and that's what you'd typically modify after install. But here we're mostly talking about mounting boot in order to drop network configs in there.

All these are things that coreos-installer has native support for (via e.g. --copy-network, --append-karg, --delete-karg), so once we switch over to that for the install, it'll be easier/safer to do these modifications.

@dtantsur
Copy link

All these are things that coreos-installer has native support for (via e.g. --copy-network, --append-karg, --delete-karg), so once we switch over to that for the install, it'll be easier/safer to do these modifications.

I'm not sure it's going to happen, at least not in all scenarios.

@hardys
Copy link

hardys commented Jul 23, 2020

All these are things that coreos-installer has native support for (via e.g. --copy-network, --append-karg, --delete-karg), so once we switch over to that for the install, it'll be easier/safer to do these modifications.

@jlebon can you link any docs on how this works in an automated environment? E.g if you pxe boot the installer ramdisk, how does this data get provided to coreos-installer for a non-interactive deploy?

As @dtantsur says there is some uncertainty/challenge in making this change (hence I mention it as a long term plan above), so we'll need to investigate further.

We'll need to document a workaround in the meantime - does anything exist (upstream or downstream) that describes the process of using virt-customize to drop additional network configuration in /boot? (I know the virt-customize commands, but not certain on the network file format expected, and we need something downstream to share with $customer).

@jlebon
Copy link
Member

jlebon commented Jul 23, 2020

@jlebon can you link any docs on how this works in an automated environment? E.g if you pxe boot the installer ramdisk, how does this data get provided to coreos-installer for a non-interactive deploy?

We don't have good docs yet for that right now, but in an automated environment, you would script a systemd unit which calls coreos-installer install --copy-network --append-karg foo ....

If you're rolling your own code, you can of course do the same thing coreos-installer does when handling these args. For --copy-network, you would copy the NM keyfiles into $bootmnt/coreos-firstboot-network (see the code that handles this on firstboot here: https://github.com/coreos/fedora-coreos-config/blob/87f5c512ad60e4f37c2b10434dda2e59e40dd97a/overlay.d/05core/usr/lib/dracut/modules.d/15coreos-network/coreos-copy-firstboot-network.sh). For kargs, you can modify the options line in the BLS configs in $bootmnt/loader/entries.

@dustymabe
Copy link
Member

Since coreos/fedora-coreos-config#426 merged (fixes coreos/fedora-coreos-tracker#443) it feels like the question of how to "Support disabling initramfs networking via Ignition" is answered.

  • Q: How do I disable initramfs networking via Ignition?
  • A: You provide an Ignition config that doesn't have any remote references.
    • This implies it doesn't require networking and no networking will be brought up in the initramfs.

The caveat to this is that there are some platforms that only get their Ignition configuration by using the network, so it's not really possible to disable it on those platforms, nor would you want to.

Shall we close this issue out?

@hardys
Copy link

hardys commented Sep 2, 2020

Since coreos/fedora-coreos-config#426 merged (fixes coreos/fedora-coreos-tracker#443) it feels like the question of how to "Support disabling initramfs networking via Ignition" is answered.

* Q: How do I disable initramfs networking via Ignition?

* A: You provide an Ignition config that doesn't have any remote references.
  
  * This implies it doesn't require networking and no networking will be brought up in the initramfs.

The caveat to this is that there are some platforms that only get their Ignition configuration by using the network, so it's not really possible to disable it on those platforms, nor would you want to.

I pushed openshift/enhancements#467 so we can discuss some ideas around the "flattened ignition" solution, which (where appropriate, and in particular for IPI baremetal) could enable a MCO managed Ignition config that doesn't have any remote references

Shall we close this issue out?

It seems the the conclusion of the discussion so far is that we won't fix this by adding some new ignition interface for early network config, so that seems reasonable to me.

@dustymabe
Copy link
Member

Thanks @hardys - Will follow along the discussion in openshift/enhancements#467

@dustymabe
Copy link
Member

dustymabe commented Sep 2, 2020

For people following Fedora CoreOS: the work for coreos/fedora-coreos-tracker#443 was first introduced in testing stream release 32.20200715.2.2 and stable stream release 32.20200715.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants