Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Booting to baremetal drops to emergency mode due to ostree-remount.service failure. #746

Closed
brtkwr opened this issue Feb 13, 2021 · 15 comments · Fixed by ostreedev/ostree#2387 or coreos/fedora-coreos-config#1114

Comments

@brtkwr
Copy link

brtkwr commented Feb 13, 2021

Describe the bug
Stable releases 32.20201104.3.0 and onwards fail to boot smoothly on baremetal.

Reproduction steps
Steps to reproduce the behavior:

  1. Boot a fedora coreos image in baremetal openstack
  2. Watch the OOB console

Expected behavior
Boot normally.

Actual behavior
Drops to emergency mode shell due to ostree-remount.service failure:

[core@k8s-alaska-mx6cndqcfjdv-node-0 ~]$ sudo journalctl -u ostree-remount.service
-- Logs begin at Sat 2021-02-13 01:44:54 UTC, end at Sat 2021-02-13 07:55:53 UTC. --
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 ostree-remount[1703]: ostree-remount: failed to remount(ro) /sysroot: Device or resource busy
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: ostree-remount.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: ostree-remount.service: Failed with result 'exit-code'.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Failed to start OSTree Remount OS/ Bind Mounts.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: ostree-remount.service: Triggering OnFailure= dependencies.
Feb 13 07:52:33 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
Feb 13 07:52:33 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Finished OSTree Remount OS/ Bind Mounts.

I checked the journal and around this error, there is avc denial errors. I tried setting selinux to permissive via ignition but the problem did not go away.

Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Finished File System Check on /dev/disk/by-label/boot.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 kernel: audit: type=1130 audit(1613180724.360:175): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2dlabel-EFI\x2dSYSTEM comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2dlabel-boot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: ostree-remount.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: ostree-remount.service: Failed with result 'exit-code'.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Failed to start OSTree Remount OS/ Bind Mounts.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 kernel: audit: type=1130 audit(1613180724.401:176): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2dlabel-boot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: ostree-remount.service: Triggering OnFailure= dependencies.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ostree-remount comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 systemd[1]: Finished Platform Persistent Storage Archival.
Feb 13 01:45:24 k8s-alaska-mx6cndqcfjdv-node-0 kernel: audit: type=1130 audit(1613180724.448:177): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ostree-remount comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

It continues to boot normally after I press ctrl+D but gets stuck on this prompt so networking doesnt come up until then.

System details

  • Baremetal OpenStack
  • Fedora CoreOS version

broken:

  • 33.20210117.3.2
  • 33.20201214.3.1
  • 32.20201104.3.0

working images:

  • 32.20201018.3.0
  • 32.20200809.3.0

As you can see , 32.20201104.3.0 is the next stable image after 32.20201018.3.0. Can anyone shed a light on any substantial change which could be leading to this issue?

Ignition config
Please attach your FCC or Ignition config used to provision your system. Be sure to sanitize any private data. If not using FCCT to generate your Ignition config, does the Ignition config pass validation using ignition-validate?

Additional information
Add any other information about the problem here.

brtkwr added a commit to stackhpc/terraform-magnum that referenced this issue Feb 13, 2021
brtkwr added a commit to stackhpc/terraform-magnum that referenced this issue Feb 13, 2021
brtkwr added a commit to stackhpc/terraform-magnum that referenced this issue Feb 13, 2021
brtkwr added a commit to stackhpc/terraform-magnum that referenced this issue Feb 13, 2021
@bac-w
Copy link

bac-w commented Feb 15, 2021

I confirm. I have the same problem.
Temporary solution:

    - name: ostree-remount.service
      enabled: true
      contents: |
        [Unit]
        Description=OSTree Remount OS/ Bind Mounts
        Documentation=man:ostree(1)
        DefaultDependencies=no
        ConditionKernelCommandLine=ostree
        OnFailure=emergency.target
        Conflicts=umount.target
        # Run after core mounts
        After=-.mount var.mount
        After=systemd-remount-fs.service
        # But we run *before* most other core bootup services that need write access to /etc and /var
        Before=local-fs.target umount.target
        Before=systemd-random-seed.service plymouth-read-write.service systemd-journal-flush.service
        Before=systemd-tmpfiles-setup.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStartPre=/usr/bin/sleep 10
        ExecStart=/usr/lib/ostree/ostree-remount
        StandardInput=null
        StandardOutput=journal
        StandardError=journal+console

        [Install]
        WantedBy=local-fs.target

add to ignition file

@travier
Copy link
Member

travier commented Feb 15, 2021

Can you share the full journal output or the AVC errors because I can not find one in the output you posted here.

@jlebon
Copy link
Member

jlebon commented Feb 18, 2021

Hmm, looks like there's something earlier than ostree-remount preventing the /sysroot from being remounted ro. Do you have any custom services running early? As @travier mentioned, the full journal would help. This would also be fixed by ostreedev/ostree#2187.

@Nemric
Copy link

Nemric commented Mar 18, 2021

Same problem here, this only occur on first reboot after new install, all right after that ...

@jlebon
Copy link
Member

jlebon commented Mar 22, 2021

Can you post the full journal from the boot where this occurred?

@brtkwr
Copy link
Author

brtkwr commented Apr 23, 2021

Hmm I cant find a good way to scrape the output to paste here, do you have any suggestions?

@travier
Copy link
Member

travier commented Apr 29, 2021

You can try saving the journal to an external drive or maybe uploading it over the network.

@jlebon
Copy link
Member

jlebon commented Jun 21, 2021

See also: #870
See also: #867

@jlebon
Copy link
Member

jlebon commented Jun 21, 2021

Been digging into this. I think this is systemd-rfkill.service and ostree-remount.service racing with each other. This is why it only manifests itself on bare metal machines with RF devices like Bluetooth/WiFi. As mentioned earlier, the long-term fix for this is ostreedev/ostree#2187, though I think a short-term fix could be:

diff --git a/src/boot/ostree-remount.service b/src/boot/ostree-remount.service
index af40453c..a54114f9 100644
--- a/src/boot/ostree-remount.service
+++ b/src/boot/ostree-remount.service
@@ -28,7 +28,7 @@ After=systemd-remount-fs.service
 # But we run *before* most other core bootup services that need write access to /etc and /var
 Before=local-fs.target umount.target
 Before=systemd-random-seed.service plymouth-read-write.service systemd-journal-flush.service
-Before=systemd-tmpfiles-setup.service
+Before=systemd-tmpfiles-setup.service systemd-rfkill.service systemd-rfkill.socket

 [Service]
 Type=oneshot

@dustymabe Would you be able to try that on your Raspberry Pi 4? (Rather than rebuilding, you could also modify the service from an rd.break shell and adding Before=systemd-rfkill.service systemd-rfkill.socket in a systemd drop-in at e.g. /run/systemd/system/ostree-remount.service.d/before-rfkill.conf.

Edit: actually, it should be totally fine to add the drop-in from Ignition too. Though if you've already got the config embedded, it might be easier to just do it live in the switchroot shell.

@dustymabe
Copy link
Member

Adding something like this:

  files:
    - path: /etc/systemd/system/ostree-remount.service.d/before-rfkill.conf
      mode: 0644
      contents:
        inline: |
          [Unit]
          Before=systemd-rfkill.service systemd-rfkill.socket

seemed to get me a functioning system:

[core@rpi4 ~]$ sudo systemctl status ostree-remount.service
● ostree-remount.service - OSTree Remount OS/ Bind Mounts
     Loaded: loaded (/usr/lib/systemd/system/ostree-remount.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/ostree-remount.service.d
             └─before-rfkill.conf
     Active: active (exited) since Tue 2021-05-25 11:20:03 UTC; 4 weeks 0 days ago
       Docs: man:ostree(1)
    Process: 958 ExecStart=/usr/lib/ostree/ostree-remount (code=exited, status=0/SUCCESS)
   Main PID: 958 (code=exited, status=0/SUCCESS)
        CPU: 20ms

May 25 11:20:01 rpi4 systemd[1]: Starting OSTree Remount OS/ Bind Mounts...
May 25 11:20:03 rpi4 ostree-remount[958]: Remounted ro: /sysroot
May 25 11:20:03 rpi4 ostree-remount[958]: Remounted rw: /etc
May 25 11:20:03 rpi4 systemd[1]: Finished OSTree Remount OS/ Bind Mounts.

jlebon added a commit to jlebon/ostree that referenced this issue Jun 22, 2021
The `systemd-rfkill.*` service falls in the category of things that need
write access to `/etc` and `/var`, so we need to make sure we run
before or it might hit the read-only sysroot.

The long-term fix for this is
ostreedev#2115.

Closes: coreos/fedora-coreos-tracker#746
jlebon added a commit to jlebon/ostree that referenced this issue Jun 22, 2021
The `systemd-rfkill.*` service falls in the category of early things
that need write access to `/var`, so we need to make sure we run before
or it might hit the read-only sysroot.

The long-term fix for this is
ostreedev#2115.

Closes: coreos/fedora-coreos-tracker#746
@jlebon
Copy link
Member

jlebon commented Jun 22, 2021

Thanks @dustymabe for checking! I opened ostreedev/ostree#2387.

@dustymabe dustymabe added status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. and removed needs/more-information labels Jun 22, 2021
@brtkwr
Copy link
Author

brtkwr commented Jul 9, 2021

Has this been incorporated into a stable release?

@lucab lucab reopened this Jul 9, 2021
@dustymabe
Copy link
Member

Upstream release has happened now: https://github.com/ostreedev/ostree/releases/tag/v2021.3

Should be in the next FCOS testing stream release.

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. and removed status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. labels Jul 15, 2021
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jul 15, 2021
jlebon pushed a commit to coreos/fedora-coreos-config that referenced this issue Jul 15, 2021
@dustymabe dustymabe reopened this Jul 15, 2021
@dustymabe
Copy link
Member

The fix for this went into testing stream release 34.20210725.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Aug 9, 2021
@dustymabe
Copy link
Member

The fix for this went into stable stream release 34.20210725.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Aug 25, 2021
smuda added a commit to AyoyAB/okd-with-ansible that referenced this issue Oct 5, 2021
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants