Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport systemd fixes for udevd skipping events (and others) #2999

Merged
merged 3 commits into from
Apr 11, 2023

Conversation

etungsten
Copy link
Contributor

@etungsten etungsten commented Apr 10, 2023

Issue number:

Resolves #2980 (comment)

Description of changes:

    packages: backport systemd fixes for udevd skipping events (and others)
    
    Backports the following, with some minor edits:
     * https://github.com/systemd/systemd/commit/2d40f02ee4317233365f53c85234be3af6b000a6
     * https://github.com/systemd/systemd/pull/22717
     * https://github.com/systemd/systemd/commit/400e3d21f8cae53a8ba9f9567f244fbf6f3e076c
     * https://github.com/systemd/systemd/commit/4f294ffdf18ab9f187400dbbab593a980e60be89
     * https://github.com/systemd/systemd/commit/c02fb80479b23e70f4ad6f7717eec5c9444aa7f4
    
    This fixes an issue with kernel uevents getting skipped by udev when the
    data partition block device is locked by 'systemd-makefs' or
    'systemd-repart'.

    release: stop repart units if we've successfully created local fs

    We explicitly stop any repart unit that is waiting on a non-existent
    data partition after 'makefs' finishes successfully. This prevents
    awkward time-out log messages from being printed to the console.
    release: create data filesystem in local-fs.target
    
    The local filesystem is expected to be ready before local-fs.target
    is reached.

    Signed-off-by: Ben Cressey <bcressey@amazon.com>

    We don't need to wait for the 'repart' units since this cause boot to
    hang until the repart units timeout on waiting for a potentiaily
    non-existent data partition.

    'systemd-repart' and 'systemd-makefs' both lock on the block device
    before operating on it, so repart is guaranteed to finish before makefs
    can create the filesystem.

    Co-authored-by: Erikson Tung <etung@amazon.com>

Testing done:

Built an aws-k8s-1.22 arm64 AMI, then launched and cycled through 2000 a1.medium instances.
All boots were successful and nodes joined the cluster successfully.

Launched and cycled through 400 m6g.medium instances.
All boots were successful and nodes joined the cluster successfully.

Launched and cycled through 400 m6g.large instances.
All boots were successful and nodes joined the cluster successfully.

Built an aws-k8s-1.22 x86_64 AMI, then launched and cycled through 100 m1.small instances.
All boots were successful and nodes joined the cluster successfully.

Launched and cycled through 400 m5.large instances.
All boots were successful and nodes joined the cluster successfully.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@stmcginnis
Copy link
Contributor

This doesn't appear to be a backport?

Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine and testing looks good.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reviewing on my phone, but this looks pretty good as far as I can tell.

The other thing to check is to verify that there are no fuzz warnings from rpm during the patch process, since that can indicate that some of the context didn't quite match.

Just exit 1 in %prep after patches are applied and inspect for warnings.

Comment on lines +85 to +87
+ /* If this is a block device and the device is locked currently via the BSD advisory locks,
+ * someone else is using it exclusively. We don't run our udev rules now to not interfere.
+ * Instead of processing the event, we requeue the event and will try again after a delay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to confirm via testing that this doesn't interfere with the previous fix to the fallback logic, since that relies on continued udev processing to generate the special block-device-typed UUID symlinks.

My recollection is that all the symlinks are created at the end, so we shouldn't see the case where the fallback unit starts execution as soon as the partuuid link exists, but before the typed link exists.

@etungsten
Copy link
Contributor Author

The other thing to check is to verify that there are no fuzz warnings from rpm during the patch process, since that can indicate that some of the context didn't quite match.

No warnings.

  #18 0.733 + cd systemd-stable-250.11                                                                                                                                                                         
  #18 0.733 + /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .                                                                                                                                                            
  #18 0.782 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0001-errno-util-add-ERRNO_IS_DEVICE_ABSENT-macro.patch                                                                                 
  #18 0.782 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.796 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0002-udev-drop-unnecessary-clone-of-received-sd-device-ob.patch                                                                        
  #18 0.796 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.808 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0003-udev-introduce-device_broadcast-helper-function.patch                                                                             
  #18 0.808 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.820 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0004-udev-assume-there-is-no-blocker-when-failed-to-check.patch                                                                        
  #18 0.820 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.832 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0005-udev-store-action-in-struct-Event.patch                                                                                           
  #18 0.832 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.844 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0006-udev-requeue-event-when-the-corresponding-block-devi.patch                                                                        
  #18 0.845 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.857 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0007-udev-only-ignore-ENOENT-or-friends-which-suggest-the.patch                                                                        
  #18 0.857 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.869 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0008-udev-split-worker_lock_block_device-into-two.patch                                                                                
  #18 0.869 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.881 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0009-udev-assume-block-device-is-not-locked-when-a-new-ev.patch                                                                        
  #18 0.881 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.893 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0010-udev-initialize-list-pointers-in-event_queue_assume_.patch                                                                        
  #18 0.893 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.905 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0011-udev-fix-inversed-inequality-for-timeout-of-retrying.patch                                                                        
  #18 0.905 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.917 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0012-udev-certainly-restart-event-for-previously-locked-d.patch                                                                        
  #18 0.917 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.929 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/0013-udev-try-to-reload-selinux-label-database-less-frequ.patch                                                                        
  #18 0.929 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.941 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9001-use-absolute-path-for-var-run-symlink.patch                                                                                       
  #18 0.941 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.953 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9002-core-add-separate-timeout-for-system-shutdown.patch                                                                               
  #18 0.953 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.966 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9003-machine-id-setup-generate-stable-ID-under-Xen-and-VM.patch                                                                        
  #18 0.966 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.978 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9004-units-mount-tmp-with-noexec.patch                                                                                                 
  #18 0.978 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 0.989 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9005-mount-setup-apply-noexec-to-more-mounts.patch                                                                                     
  #18 0.990 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.001 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9006-mount-setup-mount-etc-with-specific-label.patch                                                                                   
  #18 1.001 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.013 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9007-journal-disable-keyed-hashes-for-compatibility.patch                                                                              
  #18 1.013 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.025 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9008-pkg-config-stop-hardcoding-prefix-to-usr.patch                                                                                    
  #18 1.025 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.036 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9009-sysctl-do-not-set-rp_filter-via-wildcard.patch                                                                                    
  #18 1.037 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.048 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9010-sysusers-set-root-shell-to-sbin-nologin.patch                                                                                     
  #18 1.048 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.060 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9011-units-keep-modprobe-service-units-running.patch                                                                                   
  #18 1.060 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.072 + /usr/lib/rpm/rpmuncompress /home/builder/rpmbuild/SOURCES/9012-tmpfiles-Split-networkd-entries-into-a-separate-file.patch                                                                        
  #18 1.072 + /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f                                                                                                                                        
  #18 1.083 + exit 1                                                                                                                                                                                           

@etungsten
Copy link
Contributor Author

Push above merges the custom 0010-udev-initialize-list-pointers-in-event_queue_assume patch with the patch it was trying to remediate.

@etungsten
Copy link
Contributor Author

Push above adds comments to indicate what version of systemd the patches are backported from.

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work!

Patch0007: 0007-udev-only-ignore-ENOENT-or-friends-which-suggest-the.patch
Patch0008: 0008-udev-split-worker_lock_block_device-into-two.patch
Patch0009: 0009-udev-assume-block-device-is-not-locked-when-a-new-ev.patch
# From v252:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@etungsten
Copy link
Contributor Author

etungsten commented Apr 11, 2023

Push above addresses an issue where boot hangs for 90 seconds before proceeding when the fallback data partition is used and the preferred data partition is absent. This was due to prepare-local-fs having a dependency on both repart-data-* services.

I removed the dependency and also added measures to ensure repart-data services are stopped after systemd-makefs succeeds in making the /local filesystem. I explain why this is safe in the commit messages.

Testing

Built aws-k8s-1.22 aarch64 AMI.

Launched and cycled through 1000 m6g.medium instances. All boots were successful and nodes join the cluster.

Launched and cycled through 900 a1.medium instances. All boots were successful and nodes join the cluster.

Built aws-k8s-1.22 x86_64 AMI.

Launched and cycled through 1000 m5.large instances. All boots were successful and nodes join the cluster.

Launched and cycled through 100 m1.small instances. All boots were successful and nodes join the cluster.

Launched 500 m5.large instances with the preferred data partition missing, and only the fallback partition present with enough disk space to accommodate the local filesystem. All boots were prompt without any hanging and all nodes join the cluster

The local filesystem is expected to be ready before local-fs.target
is reached.

Signed-off-by: Ben Cressey <bcressey@amazon.com>

We don't need to wait for the 'repart' units since this cause boot to
hang until the repart units timeout on waiting for a potentiaily
non-existent data partition.

'systemd-repart' and 'systemd-makefs' both lock on the block device
before operating on it, so repart is guaranteed to finish before makefs
can create the filesystem.

Co-authored-by: Erikson Tung <etung@amazon.com>
@etungsten
Copy link
Contributor Author

Push above fixes up the commit messages to be more precise.

Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good to me.

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💸

We explicitly stop and mask any repart-data service that has a start job
waiting on a non-existent data partition after 'makefs' finishes
successfully. This prevents awkward start job timeout messages from
being printed to the console.
Backports the following, with some minor edits:
 * systemd/systemd@2d40f02
 * systemd/systemd#22717
 * systemd/systemd@400e3d2
 * systemd/systemd@4f294ff
 * systemd/systemd@c02fb80

This fixes an issue with kernel uevents getting skipped by udev when the
data partition block device is locked by 'systemd-makefs' or
'systemd-repart'.
@etungsten
Copy link
Contributor Author

Push above adds masking in addition to stopping the repart-data-* services in prepare-local-fs to prevent any potential re-triggering of starting the repart-data-* services.

@etungsten etungsten requested a review from yeazelm April 11, 2023 18:58
@etungsten etungsten merged commit 8c7776d into bottlerocket-os:develop Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BottleRocket sometimes fails to start on Ec2 instances
6 participants