Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Need to add support for passing generic pci devices with VFIO #155

Closed
egernst opened this issue Apr 2, 2018 · 4 comments
Closed

Need to add support for passing generic pci devices with VFIO #155

egernst opened this issue Apr 2, 2018 · 4 comments
Assignees
Labels
enhancement Improvement to an existing feature needs-help Request for extra help (technical, resource, etc)

Comments

@egernst
Copy link
Member

egernst commented Apr 2, 2018

From @amshinde on November 18, 2017 19:33

Refer clearcontainers/runtime#821
We currently have support for passing vfio device groups with --device. The user is expected to perform the bind to vfio-pci for this.
We need to add support for the runtime to check if a device passed is a pci device and then pass it the container VM using pci passthrough/vfio. We need to make sure that the device is the only device in its iommo group. The runtime would then unbind the device from its current kernel driver and assign the device to vfio-pci, passing it to the VM with pci-passthrough. When the container exits, the runtime would then need to bind the device back to its host driver.

Copied from original issue: containers/virtcontainers#489

@egernst
Copy link
Member Author

egernst commented Apr 2, 2018

From @amshinde on November 18, 2017 19:39

There does not seem to be straight forward way to do this, as the device node and sysfs tree for devices is different for different devices.
We would likely need to define a separate method for each kind of device.

For eg, this is how the audio and graphics cards look on a machine that I was testing:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04)

For the graphics device, the device nodes appear as :

$ls -la /dev/dri/
 total 0
 crw-rw----+  1 root video 226,   0 Nov  5 22:27 card0
 crw-rw----+  1 root video 226, 128 Nov  5 22:27 renderD128

Navigating sysfs based on major and minor number gives:

$ls -la /sys/dev/char/226\:0/device
lrwxrwxrwx 1 root root 0 Nov  6 11:23 /sys/dev/char/226:0/device -> ../../../0000:00:02.0

$ ls -la /sys/dev/char/226\:128/device
lrwxrwxrwx 1 root root 0 Nov  5 22:27 /sys/dev/char/226:128/device -> ../../../0000:00:02.0

For the audio device, things were a bit different

ls -la /dev/snd
total 0
drwxr-xr-x   2 root root       80 Nov  6 23:25 by-path
crw-rw----+  1 root audio 116,  7 Nov  6 23:25 controlC0
crw-rw----+  1 root audio 116,  2 Nov  5 22:27 controlC1
crw-rw----+  1 root audio 116, 11 Nov  6 23:25 hwC0D0
crw-rw----+  1 root audio 116,  6 Nov  5 22:27 hwC1D2
crw-rw----+  1 root audio 116,  8 Nov  6 23:25 pcmC0D3p
crw-rw----+  1 root audio 116,  9 Nov  6 23:25 pcmC0D7p
crw-rw----+  1 root audio 116, 10 Nov  6 23:25 pcmC0D8p
crw-rw----+  1 root audio 116,  4 Nov  5 22:27 pcmC1D0c
crw-rw----+  1 root audio 116,  3 Nov  5 22:27 pcmC1D0p
crw-rw----+  1 root audio 116,  5 Nov  5 22:27 pcmC1D2c
crw-rw----+  1 root audio 116,  1 Nov  5 22:27 seq
crw-rw----+  1 root audio 116, 33 Nov  5 22:27 timer

In this case, the device symlink under sysfs does not give the pci device

$ ls -la /sys/dev/char/116\:6/device
lrwxrwxrwx 1 root root 0 Nov  7 05:54 /sys/dev/char/116:6/device -> ../../card1

Essentially one more traversal yielded the pci device information:

$ readlink /sys/dev/char/116\:6/device/device
../../../0000:00:1b.0

In summary, the structure of the device nodes varies from device to device.

However instead of relying on the device symlinks under /sys/dev/char/$major-$minor/, I relaized that the path itself points to pci information:

ls -la /sys/dev/char/
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:1 -> ../../devices/virtual/sound/seq
lrwxrwxrwx 1 root root 0 Nov  7 05:45 116:10 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/pcmC0D8p
lrwxrwxrwx 1 root root 0 Nov  7 05:45 116:11 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/hwC0D0
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:2 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/controlC1
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:3 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/pcmC1D0p
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:33 -> ../../devices/virtual/sound/timer
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:4 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/pcmC1D0c
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:5 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/pcmC1D2c
lrwxrwxrwx 1 root root 0 Nov  6 11:50 116:6 -> ../../devices/pci0000:00/0000:00:1b.0/sound/card1/hwC1D2
lrwxrwxrwx 1 root root 0 Nov  7 05:45 116:7 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/controlC0
lrwxrwxrwx 1 root root 0 Nov  7 05:45 116:8 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/pcmC0D3p
lrwxrwxrwx 1 root root 0 Nov  7 05:45 116:9 -> ../../devices/pci0000:00/0000:00:03.0/sound/card0/pcmC0D7p

graphics:

lrwxrwxrwx 1 root root 0 Nov  6 11:50 226:0 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0
lrwxrwxrwx 1 root root 0 Nov  6 11:50 226:128 -> ../../devices/pci0000:00/0000:00:02.0/drm/renderD128

We can consider looking at the pci information in the above symlinks to decide if a device is pci and can be passed through VFIO.

@egernst
Copy link
Member Author

egernst commented Apr 2, 2018

@amshinde @mcastelino -- in the initial comment for this, you talk about needing to verify its a PCI device and in its own iommu group. Do we really need to do this checking? Existing hypervisor should fail if it doesn't, and we could just make sure that the errors are propogated back appropriately?

In general, is there more action required on this issue wrt Kata?

@egernst
Copy link
Member Author

egernst commented Apr 2, 2018

From @sboeuf on March 9, 2018 19:46

After some discussions with @amshinde, we need to find a way to identify the device inside the VM. We need one identifier which will help us so that we can reliably find the device we've been passing through. This way, we can make it show up as the expected path /dev/mydev for the user using --device /dev/vfio/16:/dev/mydev.

@devimc devimc added enhancement Improvement to an existing feature needs-help Request for extra help (technical, resource, etc) labels Jul 17, 2019
@dgibson
Copy link
Contributor

dgibson commented Sep 9, 2020

@egernst, it's not clear to me if you're talking about rebinding drivers in the host, or the guest.

If the guest, then I have draft code for this (working, but needs polish before merging). See #2938 for tracking.

If the host, then this seems out of scope for an OCI runtime. It's input is the OCI runtime spec which gives /dev nodes, not PCI devices as PCI devices, implying they have already been rebound on the surrounding host.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Improvement to an existing feature needs-help Request for extra help (technical, resource, etc)
Projects
None yet
Development

No branches or pull requests

5 participants