Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to conditional networking #426

Merged
merged 3 commits into from
Jul 15, 2020

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented May 25, 2020

We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443

@jlebon
Copy link
Member Author

jlebon commented May 25, 2020

@jlebon
Copy link
Member Author

jlebon commented May 26, 2020

Rebased this on top of #427!

Copy link
Contributor

@darkmuggle darkmuggle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I really like this.

@jlebon
Copy link
Member Author

jlebon commented Jun 17, 2020

This will be the final ratcheting point for FCOS. Will add a test so we can validate it before merging.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlebon
Copy link
Member Author

jlebon commented Jun 18, 2020

Almost there! All remaining patches are merged, we just need an Ignition release at this point.

@dustymabe
Copy link
Member

Almost there! All remaining patches are merged, we just need an Ignition release at this point.

Do we at least have any plans for that release to happen? /me really wants this in 😄

@jlebon
Copy link
Member Author

jlebon commented Jul 10, 2020

Do we at least have any plans for that release to happen? /me really wants this in smile

I think once coreos/ignition#960 is in, we should cut a release. /cc @bgilbert @arithx

@dustymabe
Copy link
Member

New ignition version just landed! #513

@jlebon jlebon marked this pull request as ready for review July 14, 2020 20:15
@jlebon
Copy link
Member Author

jlebon commented Jul 14, 2020

Rebased this and marking ready for review! Still sanity-checking ISO behaviour locally as well.

@dustymabe
Copy link
Member

I'll take a re-look at this PR as well and try to build locally

@jlebon
Copy link
Member Author

jlebon commented Jul 14, 2020

Still sanity-checking ISO behaviour locally as well.

Working well on FCOS!

But yeah I think for now we'll need to keep the liveiso-network-kargs bits for RHCOS until we move to spec3 there and gain the Ignition fetch-offline work. Will just tweak those to make them active on RHCOS only.

@dustymabe
Copy link
Member

This seems to be working well in my testing too! Though we did find #514 which needs to be addressed (though not in this PR). There is one subtle change here, which is if people were relying on networking configuration getting persisted into the real root and they were using ip= kargs for that in the past then they'll now need to make sure they also add rd.neednet=1, otherwise their ip= settings won't get applied persistently to the real root. Maybe a good point for a release note.

But yeah I think for now we'll need to keep the liveiso-network-kargs bits for RHCOS until we move to spec3 there and gain the Ignition fetch-offline work. Will just tweak those to make them active on RHCOS only.

Does that mean you're going to do another push to this PR?

@cgwalters
Copy link
Member

There is one subtle change here, which is if people were relying on networking configuration getting persisted into the real root and they were using ip= kargs for that in the past then they'll now need to make sure they also add rd.neednet=1, otherwise their ip= settings won't get applied persistently to the real root. Maybe a good point for a release note.

Nice catch! Hmm...could we distinguish between "our default kargs" and "kargs provided by user" in that case though?

@dustymabe
Copy link
Member

There is one subtle change here, which is if people were relying on networking configuration getting persisted into the real root and they were using ip= kargs for that in the past then they'll now need to make sure they also add rd.neednet=1, otherwise their ip= settings won't get applied persistently to the real root. Maybe a good point for a release note.

Nice catch! Hmm...could we distinguish between "our default kargs" and "kargs provided by user" in that case though?

We might be able to do some trickery (@jlebon would probably know best since he did this most recent round of work), but I'm on the fence. Since it's a straightforward change I think I'd be more inclined to request the users who need that to just add the rd.neednet=1 or even better use it as an opportunity to encourage them to put the config in the ignition config instead since they obviously don't need networking in the initramfs.

@dustymabe
Copy link
Member

the fix for #514 seems to be working great!

We have all the piece in place now to move to conditional networking. So
let's drop the `rd.neednet=1` firstboot karg.

Also don't enable coreos-liveiso-network-kargs.service on FCOS since
it's no longer needed (i.e.  the live ISO will now enable initrd
networking as required given the embedded Ignition config).

On RHCOS, we still need it for now until we move to spec3. Then we can
remove the service and script completely.

Fixes: coreos/fedora-coreos-tracker#443
We shouldn't use `/run/ignition.json` to determine whether a user
config was provided since it's implementation details. Instead, use the
new official journal messages that Ignition emits.

This is complicated by the fact that we need to support RHCOS, where the
journal messages haven't been backported. Use the fact that we always
have a base config to key off of whether to use the old behaviour vs the
new one. (More accurately, we'd want to check for
coreos/ignition#1002, but there's no easy way to
do this from the outside. Alternatively we can check the Ignition
version, though that's deeply nested under `/usr/lib/dracut/...`).

Anyway, this should be temporary until RHCOS moves to spec v3.

Closes: coreos#514
@jlebon jlebon marked this pull request as ready for review July 15, 2020 18:44
@jlebon
Copy link
Member Author

jlebon commented Jul 15, 2020

OK, the fix for #514 ended up being more involved again because RHCOS has older Ignition. See commit message for full details.

Tested this on both FCOS and RHCOS both with and without an embedded Ignition config.

@dustymabe
Copy link
Member

OK, the fix for #514 ended up being more involved again because RHCOS has older Ignition. See commit message for full details.

Tested this on both FCOS and RHCOS both with and without an embedded Ignition config.

👍

@dustymabe
Copy link
Member

Still LGTM

@bgilbert
Copy link
Contributor

The coreos.inst wrapper script appends rd.neednet=1 if it's forwarding any networking kargs to the first boot. AIUI that's still okay? We'll need to update the README though.

Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlebon jlebon merged commit 65de5e0 into coreos:testing-devel Jul 15, 2020
@jlebon jlebon deleted the pr/conditional-net branch July 15, 2020 19:55
@jlebon
Copy link
Member Author

jlebon commented Jul 15, 2020

The coreos.inst wrapper script appends rd.neednet=1 if it's forwarding any networking kargs to the first boot. AIUI that's still okay?

Yup, that should be fine.

jlebon added a commit to jlebon/coreos-assembler that referenced this pull request Aug 21, 2020
We've now reached the end of the conditional networking 🌈 rainbow 🌈.
We can rip out the legacy default network firstboot kargs from cosa.
These now live in the config repo. E.g.:

https://github.com/coreos/fedora-coreos-config/blob/be456c437d4181435022ce47079b587f5bcb0319/overlay.d/05core/usr/lib/dracut/modules.d/15coreos-network/50-afterburn-network-kargs-default.conf

For more details, see:
coreos#1373
coreos/fedora-coreos-config#426
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request Oct 28, 2020
We've now reached the end of the conditional networking 🌈 rainbow 🌈.
We can rip out the legacy default network firstboot kargs from cosa.
These now live in the config repo. E.g.:

https://github.com/coreos/fedora-coreos-config/blob/be456c437d4181435022ce47079b587f5bcb0319/overlay.d/05core/usr/lib/dracut/modules.d/15coreos-network/50-afterburn-network-kargs-default.conf

For more details, see:
#1373
coreos/fedora-coreos-config#426
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Jul 7, 2021
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Jul 8, 2021
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
jlebon added a commit that referenced this pull request Jul 8, 2021
We originally did this in #326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Jul 19, 2021
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).

(cherry picked from commit dd54e8c)
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Jul 19, 2021
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).

(cherry picked from commit dd54e8c)
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Jul 19, 2021
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).

(cherry picked from commit dd54e8c)
jlebon added a commit that referenced this pull request Jul 19, 2021
We originally did this in #326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).

(cherry picked from commit dd54e8c)
jlebon added a commit that referenced this pull request Jul 19, 2021
We originally did this in #326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).

(cherry picked from commit dd54e8c)
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
dustymabe pushed a commit to jbtrystram/fedora-coreos-config that referenced this pull request Apr 19, 2024
)

This expands the section describing permanent rollbacks, in order
to make it more explicit how to deal with auto-updates and to show
how to manually perform roll-forward steps if required.
The scenario came up in a chat thread, and it is good to have the
commands written for reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't bring up networking in the initramfs on first boot by default
5 participants