-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow instance_reconfigure to resize a stopped instance #6774
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Greg Colombo <greg@oxidecomputer.com>
// Set vCPUs and memory size. | ||
self.instance_set_size_on_conn( | ||
&conn, | ||
&err, | ||
authz_instance, | ||
ncpus, | ||
memory, | ||
) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling instance_set_size_on_conn
will unconditionally check that the instance is in a state in which it can be resized. This means that an InstanceUpdate
that doesn't change the number of vCPUs or the amount of memory will always check whether the instance is in a resizeable state, and fail if it is not --- meaning that we can no longer set auto-restart policies for running instances. I think we need to check whether the instance's memory/vCPUs are already set to the requested values, and not call this if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the test that attempts to change the auto-restart policy for a running instance will fail due to this, BTW. So there's already a way to check if you're doing the right thing. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes, that's absolutely right. i've certainly broken that test in the process, even.
let maybe_old_sizes = instance_dsl::instance | ||
.filter(instance_dsl::id.eq(authz_instance.id())) | ||
.filter(instance_dsl::time_deleted.is_null()) | ||
.select((instance_dsl::ncpus, instance_dsl::memory)) | ||
.first_async::<(InstanceCpuCount, ByteCount)>(conn) | ||
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, we are now fetching fields from the instance record in a bunch of different places in the instance_reconfigure
method...perhaps it would be nicer if we just did one instance_fetch
query and then passed an &Instance
into both this function and instance_set_boot_disk
to perform the validation? Not sure how big a deal this is, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, it occurs to me that we could just fold this into the update query, by adding a
.filter(instance_dsl::ncpus.ne(ncpus).or(instance_dsl::memory.ne(memory))
clause to make the update conditional on there actually being a change?
We would then have to handle that case in NotUpdatedButExists
as well as the "instance is in invalid state" case, but I think that's probably fine and might be a bit more efficient than doing a query, checking the results in Nexus, and then doing another query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i'd thought about that and written it off because i'd also been thinking about the caller checking instance state as well, but Instance
has .state
as well, so the state check can stay in these helpers while &Instance
-to-be-updated comes from the caller. and i wasn't enthused about loading the instance again in sic_set_boot_disk
, but we already have that in the saga parameters!
i'll give that a try, especially because it means we don't even have to do these prepatory queries over and over. adding .filter
for the fields not matching would be nice but, yeah, that's extra queries.
using the Instance
as retrieved from the saga is out of data as soon as sic_set_boot_disk
runs. so, either sic_set_boot_disk
updates the instance record (shadow copy of a database row.. not great) or sic_set_boot_disk_undo
tries to run from an instance whose boot_disk
is None (from before it was assigned), fails to detach a boot disk if one was set, and gets wedged. surmountable, i think, but really gross to do. so, more careful .filter()
it is!
even though update is probably going to be relatively infrequent, i'd like to make this not awful to read :) i'll see how it goes.
.filter()
trip report: you might think the boot disk filter stanza would look something like:
.filter(instance_dsl::boot_disk_id.ne(boot_disk_id))
but NULL
compares false
with everything, so this skips updating the boot disk if either of the prior or new boot_disk_id
are None. okay. so the real precise filter would be more like:
.filter(
instance_dsl::boot_disk_id.ne(boot_disk_id)
.or(instance_dsl::boot_disk_id.is_null().and(boot_disk_id.is_some()))
.or(instance_dsl::boot_disk_id.is_not_null().and(boot_disk_id.is_none()))
)
but this starts not working for "the types are not what diesel wants to see" reasons.
a third thing we can do, and the commit i'm actually going to push does this, is just not .filter
for the boot_disk_id
case. then we're still going to end up in NotUpdatedButExists
if we have not updated the record, so it does all actually work.
unfortunately that's extra work if you're not actually intending to update boot_disk_id
. it just seems... least bad...?
Heya! Can we hold off on merging this until v12 please? It may break the TF provider which I've already released for v11 😅 |
mostly just adapted #6321 to the new instance update endpoint machinery. same general behavior as outlined there: can only resize stopped instances, updated instance still returns the updated
InstanceAndActiveVmm
. same size validation as we do for a new instance, tests checking all that as well.Also fixes #3769.