Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sysroot.readonly at initrd time #2115

Closed
jlebon opened this issue May 26, 2020 · 12 comments · Fixed by #2187
Closed

Enable sysroot.readonly at initrd time #2115

jlebon opened this issue May 26, 2020 · 12 comments · Fixed by #2187

Comments

@jlebon
Copy link
Member

jlebon commented May 26, 2020

Follow-up from #2113. Instead of enforcing it at ostree-remount time, which runs in the real root, we should try to have it already be read-only from the start.

jlebon added a commit to jlebon/ostree that referenced this issue Aug 28, 2020
When we remount `/sysroot` as read-only, we also make `/etc` read-only.
This is usually OK because we then remount `/var` read-write, which also
flips `/etc` back to read-write... unless `/var` is a separate
filesystem and not a bind-mount to the stateroot `/var`.

Fix this by just remounting `/etc` read-write in the read-only sysroot
case.

Eventually, I think we should rework this to set everything up the way
we want from the initramfs (ostreedev#2115). This would also eliminate the window
during which `/etc` is read-only while `ostree-remount` runs.
jlebon added a commit to jlebon/ostree that referenced this issue Aug 28, 2020
When we remount `/sysroot` as read-only, we also make `/etc` read-only.
This is usually OK because we then remount `/var` read-write, which also
flips `/etc` back to read-write... unless `/var` is a separate
filesystem and not a bind-mount to the stateroot `/var`.

Fix this by just remounting `/etc` read-write in the read-only sysroot
case.

Eventually, I think we should rework this to set everything up the way
we want from the initramfs (ostreedev#2115). This would also eliminate the window
during which `/etc` is read-only while `ostree-remount` runs.
cgwalters added a commit to cgwalters/ostree that referenced this issue Aug 31, 2020
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 26, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 26, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
jlebon added a commit to jlebon/ostree that referenced this issue Jun 22, 2021
The `systemd-rfkill.*` service falls in the category of things that need
write access to `/etc` and `/var`, so we need to make sure we run
before or it might hit the read-only sysroot.

The long-term fix for this is
ostreedev#2115.

Closes: coreos/fedora-coreos-tracker#746
jlebon added a commit to jlebon/ostree that referenced this issue Jun 22, 2021
The `systemd-rfkill.*` service falls in the category of early things
that need write access to `/var`, so we need to make sure we run before
or it might hit the read-only sysroot.

The long-term fix for this is
ostreedev#2115.

Closes: coreos/fedora-coreos-tracker#746
@lucab
Copy link
Member

lucab commented Oct 4, 2021

This is still affecting users due to several possible races, so it would be good to push it a bit forward.
I saw @cgwalters attempted moving it to initrd at #2187, but I do wonder whether that one as well may prone to races with the rest of initrd services.

I started considering placing this into a systemd-generator instead, and put together a quick bash generator to explore that path. It's up at coreos/fedora-coreos-config#1257 and passes FCOS CI.

These are my findings so far:

  • /run/ostree-sysroot-ro.stamp is used both as a stamp file and as a boolean flag for RO/RW. This breaks upon re-execution, because we unlink in the first run and then check its presence in the second run. We should probably not use it as a stamp file, or have a persistent signaling file somewhere else (maybe we do already and I'm just not aware of it).
  • ostree-remount handles both /sysroot and /var remounts. The former can be safely done early on, in initrd or in a generator. The latter unfortunately has to be handled in a different way (later or with more complex logic) because there may be a var.mount unit in the real rootfs.
  • The logic handling /var is inherently subject to races, because the bind-mount operation cannot do the RO/RW switch at the same time. There is thus a window where the /var mount exists but may still need to be remounted.
  • I don't see a good way to avoid /var races with arbitrary services through systemd dependencies. I think a generator is the way to minimize the chances of races, as there aren't many things running at that point. Unfortunately that does require some logic to detect more complex cases (custom var.mount, live systems, etc.), which may be a bit fragile at least initially.

@cgwalters
Copy link
Member

I saw @cgwalters attempted moving it to initrd at #2187, but I do wonder whether that one as well may prone to races with the rest of initrd services.

I think the initramfs is a much more constrained environment and the entire goal of the initramfs is to prepare / with various hook points for that. So we should be able to do this in the initramfs.

Another argument for doing so is it's similar to what we do for the CoreOS live environment today: https://github.com/coreos/fedora-coreos-config/blob/08954efa915abd3e52240935d7e27235388f5533/overlay.d/05core/usr/lib/dracut/modules.d/35coreos-live/live-generator#L173

I started considering placing this into a systemd-generator instead

But, yes this is a strictly more powerful way of doing things in the real root. We actually already have a generator to do exactly this type of stuff https://github.com/ostreedev/ostree/blob/main/src/switchroot/ostree-system-generator.c

So that generator could be enhanced to control the timing of ostree-remount.service too.

But it's not clear to me that this really helps versus moving it to the initramfs.

@jlebon
Copy link
Member Author

jlebon commented Oct 4, 2021

This is still affecting users due to several possible races, so it would be good to push it a bit forward. I saw @cgwalters attempted moving it to initrd at #2187, but I do wonder whether that one as well may prone to races with the rest of initrd services.

Can you expand on this? If we set up /var from ostree-prepare-root.service as part of setting up the sysroot itself, there shouldn't be any race possible, no?

Conceptually, I think this would be equivalent to upstreaming ignition-ostree-mount-var.service and .sh (but making it part of ostree-prepare-root.service instead) and just leaking the /var bind-mount into switchroot. I guess the simplest way to deal with a potential other var.mount is to just let it overmount the OSTree bind-mount, like we currently do in the initrd. Though it does make the mount hierarchy uglier.

lucab pushed a commit to cgwalters/ostree that referenced this issue Oct 12, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
@lucab
Copy link
Member

lucab commented Oct 14, 2021

Yes, the idea was to redistribute the logic across ostree-system-generator and ostree-remount.service, possibly dropping the latter. Not planning to introduce a new generator, the bash script was just a quick hack.

@lucab
Copy link
Member

lucab commented Oct 14, 2021

I did explore the idea of mounting and leaking /var from the initrd. I did amend #2187 to make it work, and together with coreos/fedora-coreos-config#1270 it made into a properly booting FCOS image.

One problem I discovered though is that systemd cannot be tricked into performing stacked mounts (possibly because they aren't really sane to handle anyway, and they can't be modeled through mount units). So even with a config fragment like this:

systemd:
  units:
    - name: var.mount
      enabled: true
      contents: |
        [Unit]
        Description=Mount local ramdisk
        Before=local-fs.target
        [Mount]
        What=none
        Where=/var
        Type=tmpfs
        Options=size=2G
        [Install]
        WantedBy=local-fs.target

Once the system is fully booted the mount which is active on /var is the leaked bind-mount (with a corresponding synthetic unit from mountinfo), and not the one from the custom mount unit:

# findmnt /var
TARGET SOURCE                                      FSTYPE OPTIONS
/var   /dev/vda4[/ostree/deploy/fedora-coreos/var] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
# systemctl status var.mount | grep Loaded
     Loaded: loaded (/proc/self/mountinfo; enabled; vendor preset: enabled)

@jlebon
Copy link
Member Author

jlebon commented Oct 14, 2021

Instead of leaking /sysroot/var, would it work to have ostree-prepare-root make /ostree/deploy/fedora-coreos/var a bind-mount onto itself that's rw? So then the default var.mount bind-mount should (I think) still retain the rw property since it's from a bind-mount and not directly the ro rootfs.

@cgwalters
Copy link
Member

Another half-baked thought: What if we had the initramfs mount $stateroot/var (i.e. the "default /var") at /run/ostree-var, and then in the real root, the generator does:

  • Did something else (e.g. /etc/fstab) specify /var? If so, then just umount /run/ostree-var
  • Otherwise, mount --move /run/ostree-var /var

or so?

@jlebon
Copy link
Member Author

jlebon commented Oct 14, 2021

Another half-baked thought: What if we had the initramfs mount $stateroot/var (i.e. the "default /var") at /run/ostree-var, and then in the real root, the generator does:

  • Did something else (e.g. /etc/fstab) specify /var? If so, then just umount /run/ostree-var
  • Otherwise, mount --move /run/ostree-var /var

or so?

Heh, I was thinking about --move too and was checking if systemd mount units somehow supported it. But it didn't seem like it.

I think the issue with doing it from the generator is that it'd be pretty hard to be sure we've detected all the possible ways a var.mount could be provided (including if it's generated by a generator which runs after us).

lucab pushed a commit to cgwalters/ostree that referenced this issue Oct 18, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
@lucab
Copy link
Member

lucab commented Oct 18, 2021

Indeed I didn't see a systemd-friendly path for the mount-moving approach.

Instead, the idea of making stateroot /var rw sounds promising.
I quickly explored that option and it seems to be feasible.
The resulting mounts in the real root look like this:

|-/sysroot                                   /dev/vda4                                               xfs        ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
| `-/sysroot/ostree/deploy/fedora-coreos/var /dev/vda4[/ostree/deploy/fedora-coreos/var]             xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
|-/etc                                       /dev/vda4[/ostree/deploy/fedora-coreos/deploy/dd93ca7c64f6dc1cd1c91b6ba24866af9a22f06270bcd1cb007fe89e9fd58d6e.0/etc]
|                                                                                                    xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
|-/usr                                       /dev/vda4[/ostree/deploy/fedora-coreos/deploy/dd93ca7c64f6dc1cd1c91b6ba24866af9a22f06270bcd1cb007fe89e9fd58d6e.0/usr]
|                                                                                                    xfs        ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
|-/var                                       /dev/vda4[/ostree/deploy/fedora-coreos/var]             xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota

I pushed a further commit on top of #2187.
CI is mostly happy about it, other than this destructive test which I think it's hackish/broken anyway and needs fixing.

@cgwalters
Copy link
Member

Yeah, what that test is doing is clearly a hack we don't need to support.

But the idea of what it's testing is clearly valuable too. Hmm. I think we can make it a kola test since external tests support provisioning additional disks.

@lucab
Copy link
Member

lucab commented Oct 19, 2021

I opened #2468 to separately tweak the test and make sure it passes both before and after landing the prepare-root rework.
For reference, we also have some additional external tests covering custom var.mount units in Fedora CoreOS CI.

lucab pushed a commit to lucab/ostree that referenced this issue Oct 20, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
lucab pushed a commit to cgwalters/ostree that referenced this issue Oct 20, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
@lucab
Copy link
Member

lucab commented Oct 20, 2021

Today I somehow lost the sane path when rebasing #2187 one more time, and accidentally ended up discovering that we can have a working system where both / and /sysroot are actually read-only:

# findmnt /                   
TARGET SOURCE                                                                                                            FSTYPE OPTIONS
/      /dev/vda4[/ostree/deploy/fedora-coreos/deploy/f4a03f9e4713c4eb4983098dacea6309a59a38e6cf5bd4deed268cd14cedbd5c.0] xfs    ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
# findmnt /sysroot
TARGET   SOURCE    FSTYPE OPTIONS
/sysroot /dev/vda4 xfs    ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota

Chatting a bit more about that, it works mainly because /tmp is a tmpfs mount now. See coreos/rpm-ostree#820 (comment) on that topic.

lucab pushed a commit to cgwalters/ostree that referenced this issue Oct 25, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
lucab pushed a commit to cgwalters/ostree that referenced this issue Nov 4, 2021
Let's ensure things are right from the start in the initramfs;
this closes off various race conditions.  Followup to
ostreedev@3564225

Closes: ostreedev#2115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants