-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PCI passthrough support for cloud-hypervisor platform #3331
Conversation
Add options parameter for modprobe load function. This is usefull when we need to pass options while loading the module Signed-off-by: Smit Gardhariya <sgardhariya@microsoft.com>
1b8f941
to
930f9ae
Compare
@@ -396,7 +398,7 @@ def _configure_nodes(self, environment: Environment, log: Logger) -> None: | |||
vm_name_prefix, | |||
) | |||
|
|||
def _configure_node( | |||
def _configure_node( # noqa: C901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What warning does this suppress and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of lot of if/else in this function, it will throw too complex function error
which is C901 for flak8. There is no better way to handle this as those if/else are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we can refactor the new code to a new function for example _configure_node_passthrough_devices
and call this fn from _config_node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still one if condition we have to add like below snippet, and this would cause the same error of C901
# Set device passthrough params for node context
if node_runbook.device_passthrough:
self._configure_device_passthrough(self, node_runbook, node_context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it still throw the same error if we move the condition to _configure_device_passthrough
itself ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should skip the redundant call to function. Anyway, i made the change please check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no better way to handle this as those if/else are needed
If we make the function smaller by extracting bits out into functions, this error will go away, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no better way to handle this as those if/else are needed
If we make the function smaller by extracting bits out into functions, this error will go away, right?
Even if i would put one if condition, it was coming. I wanted to call fn only when needed and avoid redundant call even when device pass through is not required.
I removed that condition also now. Please check the latest push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change. Looks like we got rid of the flake8 error now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was because of IF we added. Now we removed it and calling fn every time
930f9ae
to
e49bbef
Compare
@@ -396,7 +398,7 @@ def _configure_nodes(self, environment: Environment, log: Logger) -> None: | |||
vm_name_prefix, | |||
) | |||
|
|||
def _configure_node( | |||
def _configure_node( # noqa: C901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we can refactor the new code to a new function for example _configure_node_passthrough_devices
and call this fn from _config_node
952c7b7
to
e727dcb
Compare
Add hostdev support for libvirt passthrough. This is only for PCI device passthrough libvirt. Signed-off-by: Smit Gardhariya <sgardhariya@microsoft.com>
e727dcb
to
1f06075
Compare
Could you please post a sample of how a runbook with passthrough configuration would look? |
This would remain same for QEMU/CLH. As of now we are focusing on CLH only. QEMU might require some additional change. |
# Configuration options for device-passthrough for the VM. | ||
@dataclass_json() | ||
@dataclass | ||
class LibvirtHostPciDeviceSchema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the comments, Smit.
I have another comment in general about the approach.
As I understand it, device_passthrough
is under node requirements. While this approach works with deploying of a single node, this fails with cases where multiple nodes with passthrough is needed. For example, all network tests will create multiple nodes. And by the current code logic, the deployment of second will fail since all pci devices provided in runbook would be assigned to first node itself.
PCI devices are associated with the host node. And in our context, host node is synonymous to platform. So, we should associate the host pci devices to platform than to node requirement.
Here is what I'm thinking in the form of a sample schema. Introduce 'device_pools' at platform level and define all the host devices there. Pools can be of different types like 'pci_net', 'pci_gpu' etc. All the devices that we want to be used in provisioning guests need to be defined here. And for a node/VM to use a passthrough device, the pool type and devices count has to be mentioned in the node requirement. Platform needs to handle assigning the devices accordingly from the available devices of the pool. Platform needs to maintain the state of the devices (which vm a device is assigned to or not, etc).
platform:
- type: cloud-hypervisor
cloud-hypervisor:
network_boot_timeout: $(boot_timeout)
hosts:
- address: $(host_address)
username: $(host_username)
private_key_file: $(admin_private_key_file)
capture_libvirt_debug_logs: true
device_pools:
- type: "pci_net"
devices:
- domain: "1"
bus: "1"
slot: "1"
function: "1"
- domain: "2"
bus: "2"
slot: "2"
function: "2"
- type: "pci_gpu"
devices:
- domain: "3"
bus: "3"
slot: "3"
function: "3"
- domain: "4"
bus: "4"
slot: "4"
function: "4"
requirement:
core_count: $(guest_vcpu)
memory_mb: $(guest_memory)
cloud-hypervisor:
disk_img: $(qcow2)
disk_img_format: "qcow2"
cloud_init:
extra_user_data: $(extra_user_data)
device_passthrough:
- host_device_pool: "pci_net"
managed: "yes"
count: 2
I haven't put too much thought into schema. What I listed above is the rough idea. I've also discussed this with @anirudhrb . We can brainstorm more on this offline and finalize the schema.
Add PCI passthrough support for cloud-hypervisor platform
Add parameter to pass options while loading module using modprobe