Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: LVM2 module uevent processing optimization #814

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

tbzatek
Copy link
Member

@tbzatek tbzatek commented Oct 27, 2020

This is a set of improvements to LVM2 module uevent processing. The primary goal was to rate-limit number of lvm commands spawned (using the libblockdev lvm-command based plugin) to prevent filling the sequential udisks daemon uevent queue. Uevent timestamp checking has been introduced and requests to the LVM updates are omitted when another update is running or when there was no new uevent received since the last update.

However there are couple of conceptual problems that are out of scope of this pull request. The major problem is the big global PVs/VGs/LVs update done in the udisks lvm2 module. The whole system picture is updated always when an uevent is received on a block device carrying DM/LVM signature. The big update then touches multiple D-Bus objects, updating object paths - cross references between objects, etc.

By skipping calls to this global update may however uncover unexpected issues that are not obvious on a first sight. I've changed the order of uevent processing to run callbacks to modules as last. That's just a workaround and proper solution should be implemented (thus marking this PR as WIP).

The proper way would be to introduce pre- and post- hooks to block device uevent processing chain and split the current big update to smaller chunks, reacting on particular block devices only. I.e. attaching extra D-Bus interfaces to existing block objects should be separated and proper module API used, meta module objects should be handled in the current big update routine and cross-object references should be updated in a post- hook. This is a non-trivial amount of work and subsequent debugging and verification.

Related: #811

Also, a simple load test to create and destroy 1000 LVs/snapshots is introduced to produce some load and prepare "heavy" testing environment.

Supposed to be used for heavy load environment preparation or
for running load tests in general. Should not be included in
regular test runs.
This creates 1000 LVM snapshots over a persistent image file
attached as loop device. Splitting the test case into "setup"
and "teardown" parts allows debugging on such heavy environment.
Just like a primary coldplug that is run upon daemon startup this secondary
coldplug indicates modules a secondary initialization over existing drive
and block objects. Upon activating a module it's desirable to attach extra
module interfaces over existing objects, perform additional probing and
expose all available information in a synchronous fashion so that upon
retuning from the org.freedesktop.UDisks2.Manager.EnableModule() method call
all exported objects and their properties are up-to-date.
It is possible to distinguish a coldplug phase now so remove
the dirty workaround. This will change timing of object properties
validity and availability of additional objects at the expense
of longer coldplug phase.
As long as GUdevDevice doesn't carry timestamp of the reported uevent,
let's use system monotonic time and bind a timestamp to it as soon as possible.
Betting on natural atomicity of the 64-bit value, may need explicit
measures to be added around (TODO).
Updates should be done sync while coldplugging and async updates should
be rate limited. This is done by comparing timestamps of (filtered) uevents,
keeping the update job limited to a single instance, optionally
queueing another update once the previous one finished.

We rely on uevents being generated for any change to lvm2 block devices,
otherwise this wouldn't work.
For consistency reasons we would like to run updates sync, however
with large number of LVs in the system this can take considerable
amount of time, blocking the serialized daemon uevent processing.

As a workaround this change introduces an arbitrary threshold
that will switch updates back to async as was historically observed.
There's a known bug somewhere between lvm2 and udev, keeping
/dev entries around while the block devices are actually gone.
This causes problems on subsequent test runs while it seems
nothing else breaks down.
This is a major change and will need to be followed up to make
things right. This essientially reorders the moment the uevent
is propagated to modules and may affect availability of information
on exposed D-Bus objects.

This is all about dependencies, i.e. in case a module gathers
information from existing block objects, it needs to know whether
that block object has already finished processing the particular
uevent or not. This commit defines that order and it appears to be
sensible to guarantee the "core" daemon objects and its interfaces
are updated by the time udisks_module_new_object() is called. That
happens as a last step of uevent processing chain which includes
block object specific module callbacks for exposing extra D-Bus
interfaces (run prior to udisks_module_new_object()).

However it may not suit all cases and would be perhaps wise to provide
pre- and post- udisks_module_new_object() callbacks for finer
granularity and precise definition of the moment of uevent processing.

For the moment let's run udisks_module_new_object() as the last
in the chain and make modules aware of that.
@tbzatek tbzatek marked this pull request as draft October 29, 2020 13:37
@tbzatek
Copy link
Member Author

tbzatek commented Oct 29, 2020

@vojtechtrefny, @vpodzime: please let me know what do you think of the 'scalable mode' for the moment.

The arbitrary threshold of 100 LVs in the system is just a wild shot, no real performance numbers behind it.

The thing is that running all updates sync as part of uevent processing brings benefits to scripts and tests included. However it also delays processing of the sequential uevent queue and any sync updates should be treated with caution (until sid materializes and allows parallel uevent queue processing).

Also the loadtest LVM2 tests are perhaps a bit hacky but proved to be useful for creating a heavy environment for debugging.

@tbzatek
Copy link
Member Author

tbzatek commented Nov 2, 2020

Jenkins, test this please.

1 similar comment
@tbzatek
Copy link
Member Author

tbzatek commented Nov 5, 2020

Jenkins, test this please.

This might trigger some critical warnings from the daemon as
the VG/LV might not exist. Mostly harmless as until now
the return code of open() wasn't actually checked.

Subject to further fixes in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant