# Avoid amdgpu attaching to one specific card? GPU Passthrough

## austinramsay

Hey everyone.

I've been battling this Windows 10 VM setup with GPU passthrough on my Gentoo host for the past couple days. I'm at the point where I know it will work if I can correctly bind the VFIO driver at boot rather than letting the amdgpu driver attach to my guest's graphics card but can't figure out the proper method. I am using TWO Navi 10 graphics cards between my Gentoo host and Windows guest. A 5600XT and 5700XT respectively. So, because I am using a card on my host that requires the amdgpu driver I can't just blacklist it like what is recommended in many posts online. I'm using KVM and Virt-manager to handle my VMs.

Here's a little background on the setup:

MSI X570 Ace motherboard (excellent IOMMU group support, everything is basically in its own group so no issues here)

1x Radeon 5600XT (Gentoo card, PCIe slot 1)

1x Radeon 5700XT (Windows card, PCIe slot 2)

64GB RAM

Virt-manager/KVM handling VM setup and management

What I've been doing up until this point was trying to manually unbind the guest's card from the amdgpu driver by echoing the PCI ID of the card into the 'unbind' file of the driver and then echoing to the vfio-pci driver.. here is my script I was using:

```
#!/bin/bash

modprobe vfio-pci

for dev in "$@"; do

        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)

        device=$(cat /sys/bus/pci/devices/$dev/device)

        if [ -e /sys/bus/pci/devices/$dev/driver ]; then

                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind

        fi

        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id

done

```

The PCI device IDs are here for the Windows guest graphics card, the 5700XT. IDs: 32:00.0 and 32:00.1 (the card and it's audio device). Both of these are passed through to the VM. IDs 2f:00.00 and 2f:00.01 are the 5600XT for my Gentoo host.

```
$ lspci -nn

32:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ff)

32:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38] (rev ff)

2f:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ca)

2f:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

```

I would call that script above with these IDs as arguments. The script would complete but in 'dmesg' I would get:

```
$ dmesg

[   24.217402] VFIO - User Level meta-driver version: 0.3

[   24.219567] vfio_pci: add [1002:731f[ffffffff:ffffffff]] class 0x000000/00000000

[   24.219572] vfio_pci: add [1002:ab38[ffffffff:ffffffff]] class 0x000000/00000000

**** [   24.221441] [drm:amdgpu_pci_remove [amdgpu]] *ERROR* Hotplug removal is not supported ****

[   24.221851] [drm] amdgpu: finishing device.

[   24.237020] [drm] free PSP TMR buffer

[   24.297696] ------------[ cut here ]------------

[   24.297739] WARNING: CPU: 6 PID: 2652 at drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3191 amdgpu_vm_manager_fini+0x19/0x50 [amdgpu]

[   24.297740] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio kvm_amd amdgpu iwlmvm mfd_core gpu_sched ttm iwlwifi efivarfs

[   24.297747] CPU: 6 PID: 2652 Comm: vfio-bind Not tainted 5.8.9-gentoo #7

[   24.297748] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 ACE (MS-7C35), BIOS 1.80 01/16/2020

[   24.297787] RIP: 0010:amdgpu_vm_manager_fini+0x19/0x50 [amdgpu]

[   24.297788] Code: a0 4e 00 00 02 00 00 00 eb a3 0f 0b eb e4 0f 1f 00 55 48 89 fd 48 81 c7 a8 4e 00 00 48 83 ec 08 48 83 bd b0 4e 00 00 00 74 14 <0f> 0b e8 e0 16 27 d0 48 83 c4 08 48 89 ef 5d e9 53 a4 00 00 31 f6

[   24.297789] RSP: 0018:ffff9ed088727da8 EFLAGS: 00010282

[   24.297791] RAX: ffffffffc0862be0 RBX: ffff8a6462020010 RCX: ffff8a647600af80

[   24.297791] RDX: 0000000000000001 RSI: ffff8a647600a780 RDI: ffff8a6462024ea8

[   24.297792] RBP: ffff8a6462020000 R08: 0000000000000001 R09: ffffffffc0308300

[   24.297793] R10: ffff8a646b95c400 R11: 0000000000000000 R12: 0000000000000001

[   24.297793] R13: ffff8a6462036940 R14: ffff9ed088727f10 R15: ffff8a647793c7a0

[   24.297794] FS:  00007fa30c6d1b80(0000) GS:ffff8a647e980000(0000) knlGS:0000000000000000

[   24.297795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[   24.297796] CR2: 00007f107be03448 CR3: 0000000fd6396000 CR4: 0000000000340ee0

[   24.297797] Call Trace:

[   24.297841]  gmc_v10_0_sw_fini+0x9/0x30 [amdgpu]

[   24.297884]  amdgpu_device_fini+0x270/0x439 [amdgpu]

[   24.297920]  amdgpu_driver_unload_kms+0x44/0x80 [amdgpu]

[   24.297955]  amdgpu_pci_remove+0x3c/0x70 [amdgpu]

[   24.297958]  pci_device_remove+0x44/0xb0

[   24.297961]  device_release_driver_internal+0xdf/0x1c0

[   24.297962]  unbind_store+0xfa/0x130

[   24.297965]  kernfs_fop_write+0xd3/0x1c0

[   24.297968]  vfs_write+0xf0/0x220

[   24.297970]  ksys_write+0x6b/0x100

[   24.297973]  do_syscall_64+0x3e/0x70

[   24.297975]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   24.297977] RIP: 0033:0x7fa30c804133

[   24.297979] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18

[   24.297979] RSP: 002b:00007fff35468a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001

[   24.297980] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fa30c804133

[   24.297981] RDX: 000000000000000d RSI: 0000558a2be2d580 RDI: 0000000000000001

[   24.297982] RBP: 0000558a2be2d580 R08: 000000000000000a R09: 000000000000000c

[   24.297982] R10: 0000558a2b3adbc2 R11: 0000000000000246 R12: 000000000000000d

[   24.297983] R13: 00007fa30c8cf6a0 R14: 000000000000000d R15: 00007fa30c8ca8a0

[   24.297984] ---[ end trace dc5177fd4508bc7d ]---

[   24.297989] ------------[ cut here ]------------

[   24.297990] Still active user space clients!

[   24.298048] WARNING: CPU: 6 PID: 2652 at drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:105 amdgpu_gem_force_release+0xf9/0x110 [amdgpu]

[   24.298049] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio kvm_amd amdgpu iwlmvm mfd_core gpu_sched ttm iwlwifi efivarfs

[   24.298058] CPU: 6 PID: 2652 Comm: vfio-bind Tainted: G        W         5.8.9-gentoo #7

[   24.298060] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 ACE (MS-7C35), BIOS 1.80 01/16/2020

[   24.298099] RIP: 0010:amdgpu_gem_force_release+0xf9/0x110 [amdgpu]

[   24.298102] Code: 48 c7 c7 5d 87 b4 c0 c6 05 70 f0 3c 00 01 e8 6e 79 e9 cf 0f 0b eb 8a 48 c7 c7 68 d5 b1 c0 c6 05 5a f0 3c 00 01 e8 57 79 e9 cf <0f> 0b e9 50 ff ff ff e8 3b 7b a4 d0 66 66 2e 0f 1f 84 00 00 00 00

[   24.298105] RSP: 0018:ffff9ed088727d80 EFLAGS: 00010282

[   24.298108] RAX: 0000000000000000 RBX: ffff8a6475968400 RCX: 0000000000000027

[   24.298111] RDX: 0000000000000027 RSI: 0000000000000092 RDI: ffff8a647e997368

[   24.298113] RBP: ffff8a6462020000 R08: ffff8a647e997360 R09: 000000000000052d

[   24.298114] R10: ffffffff922df938 R11: 0000000000000001 R12: 0000000000000001

[   24.298115] R13: ffff8a646bad70f0 R14: ffff8a646bad70d0 R15: ffff8a647793c7a0

[   24.298116] FS:  00007fa30c6d1b80(0000) GS:ffff8a647e980000(0000) knlGS:0000000000000000

[   24.298117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[   24.298117] CR2: 00007f107be03448 CR3: 0000000fd6396000 CR4: 0000000000340ee0

[   24.298118] Call Trace:

[   24.298159]  gmc_v10_0_sw_fini+0x21/0x30 [amdgpu]

```

Or, in other words the unbind wasn't completely successful and "Hotplug removal is not supported" so this is not going to work. I can still see the card in Windows but it doesn't work properly and drivers wont completely recognize it. Device manager shows the card had a failure (error 43). 

The guest card does show the 'vfio-pci' driver in use after running the script:

```
$ lspci -nnk

32:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ff)

        Kernel driver in use: vfio-pci

        Kernel modules: amdgpu

32:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38] (rev ff)

        Kernel driver in use: vfio-pci

```

When attempting to boot the VM after attempting to unbind the guest card devices, I get this in 'dmesg' and an 800x600 display in the VM with an error 43 of the card shown in Device Manager under Windows saying "this device is not working properly".

```
[  180.703990] vfio-pci 0000:32:00.1: refused to change power state from D0 to D3hot

[  183.856590] ata6.00: Enabling discard_zeroes_data

[  183.859625]  sda: sda1 sda2 sda3 sda4

[  184.107989] vfio-pci 0000:32:00.1: refused to change power state from D0 to D3hot

[  185.298576] vfio-pci 0000:32:00.1: can't change power state from D0 to D3hot (config space inaccessible)

[  185.298580] vfio-pci 0000:32:00.0: can't change power state from D0 to D3hot (config space inaccessible)

[  185.426489] AMD-Vi: Completion-Wait loop timed out

[  185.550584] AMD-Vi: Completion-Wait loop timed out

[  185.674637] AMD-Vi: Completion-Wait loop timed out

[  185.798941] AMD-Vi: Completion-Wait loop timed out

[  186.302335] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=32:00.0 address=0xff8581540]

[  186.680865] ata6.00: Enabling discard_zeroes_data

[  186.683853]  sda: sda1 sda2 sda3 sda4

[  187.302774] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=32:00.0 address=0xff8581580]

[  187.302776] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=32:00.0 address=0xff85815a0]

```

If I could get the amdgpu driver to avoid attaching to this specific card at boot, I should be fine. Any suggestions on what approach to take? I've been reading dozens of posts online of course but some of them are over 8 years old and confusing since the methods are not the same as today. Any help is appreciated. Thanks!Last edited by austinramsay on Tue Sep 22, 2020 3:31 pm; edited 3 times in total

----------

## DespLock

Hi austinramsay,

there are some important information which your post is missing. First of all: which virtualization are you using?

VirtManager can be used p. e. for KVM or Xen. 

Making an educated guess, i tend to KVM. 

In this case pls read and check

https://wiki.gentoo.org/wiki/GPU_passthrough_with_libvirt_qemu_kvm

----------

## Anon-E-moose

if both the video and the audio portion have been captured by amdgpu and the sound driver respectively then you need to unbind both before you can do anything with the card.

I don't have the messages handy, but I always got the hotplug message when I unbound the second card but it worked anyway.

What are the id's for each of your video card,  I see 1002:731f (video) and 1002:ab38 (audio) on one of them, what's the other.

Depending on the answer you have to do things a certain way.

----------

## austinramsay

 *DespLock wrote:*   

> Hi austinramsay,
> 
> there are some important information which your post is missing. First of all: which virtualization are you using?
> 
> VirtManager can be used p. e. for KVM or Xen. 
> ...

 

Sorry about that, somehow left that out in the original post. I'm using KVM. I have read that guide multiple times over the past few days which has helped me set it up to where I am now but doesn't help with this specific issue.

 *Anon-E-moose wrote:*   

> if both the video and the audio portion have been captured by amdgpu and the sound driver respectively then you need to unbind both before you can do anything with the card.
> 
> I don't have the messages handy, but I always got the hotplug message when I unbound the second card but it worked anyway.
> 
> What are the id's for each of your video card, I see 1002:731f (video) and 1002:ab38 (audio) on one of them, what's the other.
> ...

 

Here is my Gentoo host's 5600XT which I also updated the original post with:

```
$ lspci -nn

2f:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ca)

2f:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

```

And just for reference again, here is my Windows guest's 5700XT:

```
$ lspci -nn

32:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ff)

32:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38] (rev ff)

```

I did try to unbind both 32:00.0 and 32:00.1 prior to running the VM, but am still getting errors from dmesg and the card is not working properly under Windows. Or are you saying both the host card and guest card need to be unbound before I can do anything? 

I get these errors in dmesg when booting the Windows VM after attempting the unbinding of the 32:00.0 and 32:00.1 devices:

```
 $ dmesg

[  180.703990] vfio-pci 0000:32:00.1: refused to change power state from D0 to D3hot

[  183.856590] ata6.00: Enabling discard_zeroes_data

[  183.859625]  sda: sda1 sda2 sda3 sda4

[  184.107989] vfio-pci 0000:32:00.1: refused to change power state from D0 to D3hot

[  185.298576] vfio-pci 0000:32:00.1: can't change power state from D0 to D3hot (config space inaccessible)

[  185.298580] vfio-pci 0000:32:00.0: can't change power state from D0 to D3hot (config space inaccessible)

[  185.426489] AMD-Vi: Completion-Wait loop timed out

[  185.550584] AMD-Vi: Completion-Wait loop timed out

[  185.674637] AMD-Vi: Completion-Wait loop timed out

[  185.798941] AMD-Vi: Completion-Wait loop timed out

[  186.302335] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=32:00.0 address=0xff8581540]

[  186.680865] ata6.00: Enabling discard_zeroes_data

[  186.683853]  sda: sda1 sda2 sda3 sda4

[  187.302774] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=32:00.0 address=0xff8581580]

[  187.302776] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=32:00.0 address=0xff85815a0]

```

Even after powering off the host completely and powering back on, I get the same message regardless.

----------

## Anon-E-moose

I was running 2 amd cards at one time (rx 550 and 560) and they both shared the same vendor:product id, seems lots of vendors do that. Anyway....

What I did was to let amdgpu start up (binding both cards) and blacklist my hdmi sound audio module, snd_hda_intel in my case

Then in /etc/local.d/baselayout1.start I had this

```
modprobe kvm-amd

modprobe vfio_pci

# amd video

echo 0000:0b:00.0 > /sys/bus/pci/drivers/amdgpu/unbind

echo vfio-pci > /sys/bus/pci/devices/0000:0b:00.0/driver_override

echo 0000:0b:00.0 > /sys/bus/pci/drivers/vfio-pci/bind

echo > /sys/bus/pci/devices/0000:0b:00.0/driver_override

#     audio hdmi

echo vfio-pci > /sys/bus/pci/devices/0000:0b:00.1/driver_override

echo 0000:0b:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

echo > /sys/bus/pci/devices/0000:0b:00.1/driver_override

modprobe snd-hda-intel
```

You'll have to adjust the pci number properly.

I found that trying to unbind the audio portion of the video card from the audio modules caused problems, never seemed to free itself properly, so I did it this way.

If the vendor:product ids were different it'd be easier.

----------

## austinramsay

 *Anon-E-moose wrote:*   

> I was running 2 amd cards at one time (rx 550 and 560) and they both shared the same vendor:product id, seems lots of vendors do that. Anyway....
> 
> What I did was to let amdgpu start up (binding both cards) and blacklist my hdmi sound audio module, snd_hda_intel in my case
> 
> Then in /etc/local.d/baselayout1.start I had this
> ...

 

I gave that a try and ended up with the same problem still along with the same errors from dmesg shown in the output I posted earlier unfortunately. Perhaps the RX 500 series cards handled the reset a little better or something? Of course Navi has the extremely annoying reset bug which these error messages are consistent with. Although I can't remember if the 500 series has the same issue? 

Any other ideas? Could I use some kind of initramfs system to help out with this before the amdgpu module loads?

----------

## Anon-E-moose

Did a little googling on navi's, linux and qemu, there is some problem going on with it.

Also what kernel version are you running.

Edit to remove silly questions that were answered in first post   :Embarassed: 

Edit to add: If it works other than the error 43 in windows, then this will address that https://forum.level1techs.com/t/navi-reset-kernel-patch/147547

Note: this is a kernel patch, not sure which versions of the kernel it's designed for, or whether it matters.

----------

## DespLock

I'm concerned about this line:

 *Quote:*   

> 
> 
> [  185.298580] vfio-pci 0000:32:00.0: can't change power state from D0 to D3hot (config space inaccessible) 
> 
> 

 

Here is a great tutorial which might help you further:

https://heiko-sieger.info/running-windows-10-on-linux-using-kvm-with-vga-passthrough/

----------

## x90e

I am using kernel 5.4.60 at the moment with an override ACS patch which I used to try to bypass the navi reset bug but it still happens. I will attach the patch I used to the bottom.

 Ive had to deal with this (5700xt, r9 270x, both using amdgpu) but since both your vendor and product ids are the same you're going to have to bind by the group / pci ID because both of those should be different.  You want to put kernel command line arguments to stop the boot GPU from attaching to anything.  in /etc/default/grub use 

```
GRUB_CMDLINE_LINUX_DEFAULT="video=efifb:off iommu=pt amd_iommu=on kvm_amd=on pcie_acs_override=downstream,multifunction quiet default_hugepagesz=1G hugepagesz=1G hugepages=16"
```

video=efifb:off iommu=pt amd_iommu=on kvm_amd=on being the important switches, obviously the hugepages stuff you don't have to do, that's just optimization, and you don't have to use the pcie_acs_override if you have great grouping, I use it to separate a couple IOMMU groups on my b450-f so that I can pass through more items to my VM.

(Remember to mount /boot; grub-mkconfig -o /boot/grub/grub.cfg after editing your default grub config, and also that your user is in the kvm, qemu, libvirt groups)

Then, you need to tell xorg which card to use for the host and just for ease of use I've set xdm to startup on boot, you can use whatever display manager.. but on gentoo w/ openrc

```
# rc-update add xdm default
```

and my ~/.xinitrc is setup for kde/plasma/X11

```
#!/bin/sh

exec dbus-launch --exit-with-session startplasma-x11
```

my /etc/X11/xorg.conf:

```
Section "Device"

    Identifier      "amdgpu"

    Driver          "amdgpu"

    BusId           "PCI:10:0:0"

EndSection

#pci bus 0x000a cardnum 0x00 function 0x00: vendor 0x1002 device 0x6810

# Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]

#For multiple graphics cards specify the BusID in the form PCI:bus:device:function provided by scanpci. 

```

That's me telling Xorg to use my r9 270x for host graphics, passing through the 5700xt to vfio-pci, you can see below that the r9 270x is 0a:00.0 and 0a:00.1 which is hex for 10 so I put 10 in the PCI part of the busID in the xorg.conf 

the busID is the important part, when I do iommu.sh (which is just a script you already found I believe) i get this:

```
  ** condensed for the important parts **

IOMMU Group 15 01:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201] (rev 03)

IOMMU Group 17 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)

IOMMU Group 29 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)

IOMMU Group 30 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

IOMMU Group 31 0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X] [1002:6810]

IOMMU Group 32 0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] [1002:aab0]

IOMMU Group 33 0b:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201] (rev 03)

IOMMU Group 39 0d:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

```

We're looking at the number after the IOMMU group, the pci ID and the vendor / product ID at the end. 

this is my /etc/modprobe.d/amdgpu.conf

```
softdep amdgpu pre: vfio-pci

softdep amd_iommu_v2 pre: vfio-pci

```

telling those drivers not to load until after vfio-pci (because both cards use amdgpu so we want to bind the passthrough card before loading the graphics of the host)

my /etc/modprobe.d/vfio-pci.conf (you could probably combine these two .conf files but I keep it separate for each module)

```
options vfio-pci ids=1002:731f,1002:ab38,1cc1:8201,1022:43c8

#,1cc1:8201 nvme

#,1022:43c8 sata

```

1002:731f,1002:ab38,

now those are the vendor/product ids of my 5700xt video and audio, (you have to pass through video and audio together.. but you don't have to passthrough the downstream/upstream pcie ports that come with the card) then one of my nvme drives and my sata ahci controller. 

and my /etc/conf.d/modules tells the system which modules to load on boot. it's important to have your kernel configured for these options to loadable modules.. I actually accidentally put the vfio and the amd_iommu *in* the kernel rather than as a module so it gripes at me that they are already present when I try to load them but I leave it there cause it works lol

```

modules="vfio vfio-pci vfio_iommu_type1 vfio_virqfd amd_iommu_v2 amdgpu snd_hda_intel"

```

I didn't run into error 43 but I did run into the navi reset bug, so I have downloaded my own card's Navi 10.rom on a windows install with gpu-z and the one from https://www.techpowerup.com/vgabios/212169/sapphire-rx5700xt-8192-190616 as I have a sapphire pulse, but you can find your own. that I pass through to my windows 10 VM (but I don't think it works for me as I still sometimes run into the reset bug after shutting down my VM, and have to either suspend system to RAM to fix it but usually I have to fully reboot) but my VM works great and has the latest AMD drivers and everything.  

So with this setup, after I boot up the host and login  I run this script in order to unbind and rebind one nvme drive, my sata controller, and HD audio controller: It's simply a wrapper around a second script that provides all my IDs to passthrough to vfio-pci, you can compare the IDs with the above and see where to get the groups and vendor and product. 

you don't need to do this if you don't want to pass through anything other than your video card. you should probably first just set it up with the virtio drivers and then worry about passthrough of other stuff after you've got it working, unless you want to, then go ahead

```

#!/bin/sh

#first nvme drive (both nvme drives are the same pci ids so i'm running into your problem with these instead but I got it working like this

/usr/sbin/vfio-pci-bind 0000:01:00.0 1cc1:8201  

/usr/sbin/vfio-pci-bind 0000:02:00.1 1022:43c8  #sata controller

/usr/sbin/vfio-pci-bind 0000:0d:00.3 1022:1457  #hd audio controller for the mobo

```

and this script I found I have saved in /usr/sbin/vfio-pci-bind and I just use that wrapper script to call this will the correct IDs:

you could silence the output of this script and make it run automatically if you wanted to, but I left it like this because I wanted to be able to access these devices in the host &/or the guest

```
#!/usr/bin/env bash

# -*- coding: utf-8 -*-

#

# =============================================================================

#

# The MIT License (MIT)

#

# Copyright (c) 2015 Andre Richter

#

# Permission is hereby granted, free of charge, to any person obtaining a copy

# of this software and associated documentation files (the "Software"), to deal

# in the Software without restriction, including without limitation the rights

# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

# copies of the Software, and to permit persons to whom the Software is

# furnished to do so, subject to the following conditions:

#

# The above copyright notice and this permission notice shall be included in all

# copies or substantial portions of the Software.

#

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

# SOFTWARE.

#

# =============================================================================

#

# Author(s):

#   Andre Richter, <andre.o.richter @t gmail_com>

#

# =============================================================================

#

# This script takes one or two parameters in any order:

#   <Vendor:Device> i.e. vvvv:dddd

#   <Domain:Bus:Device.Function> i.e. dddd:bb:dd.f

# and then:

#

#  (1) If both <Vendor:Device> and <Domain:Bus:Device.Function> were provided,

#      validate that the requested <Vendor:Device> exists at <Domain:Bus:Device.Function>

#

#      If only <Vendor:Device> was provided, determine the current 

#      <Domain:Bus:Device.Function> for that device.

#

#      If only <Domain:Bus:Device.Function> was provided, use it.

#

#  (2) Unbinds all devices that are in the same iommu group as the supplied

#      device from their current driver (except PCIe bridges).

#

#  (3) Binds to vfio-pci:

#    (3.1) The supplied device.

#    (3.2) All devices that are in the same iommu group.

#

#  (4) Transfers ownership of the respective iommu group inside /dev/vfio

#      to $SUDO_USER

#

# Script must be executed via sudo

BDF_REGEX="^[[:xdigit:]]{2}:[[:xdigit:]]{2}.[[:xdigit:]]$"

DBDF_REGEX="^[[:xdigit:]]{4}:[[:xdigit:]]{2}:[[:xdigit:]]{2}.[[:xdigit:]]$"

VD_REGEX="^[[:xdigit:]]{4}:[[:xdigit:]]{4}$"

if [[ $EUID -ne 0 ]]; then

    echo "Error: This script must be run as root" 1>&2

    exit 1

fi

if [[ -z "$@" ]]; then

    echo "Error: Please provide Domain:Bus:Device.Function (dddd:bb:dd.f) and/or Version:Device (vvvv:dddd)" 1>&2

    exit 1

fi

unset VD BDF

for arg in "$@"

do

    if [[ $arg =~ $VD_REGEX ]]; then

        VD=$arg

    elif [[ $arg =~ $DBDF_REGEX ]]; then

        BDF=$arg

    elif [[ $arg =~ $BDF_REGEX ]]; then

        BDF="0000:${arg}"

        echo "Warning: You did not supply a PCI domain, assuming ${BDF}" 1>&2

    else

        echo "Error: Please provide Version:Device (vvvv:dddd) and/or Domain:Bus:Device.Function (dddd:bb:dd.f)" 1>&2

        exit 1

    fi

done

# BDF not provided, find BDF for Vendor:Device

if [[ -z $BDF ]]; then

    COUNT=$(lspci -n -d ${VD} 2>/dev/null | wc -l)

    if [[ $COUNT -eq 0 ]]; then

        echo "Error: Vendor:Device ${VD} not found" 1>&2

        exit 1

    elif [[ $COUNT -gt 1 ]]; then

        echo "Error: Multiple results for Vendor:Device ${VD}, please provide Domain:Bus:Device.Function (dddd:bb:dd.f) as well" 1>&2

        exit 1

    fi

    BDF=$(lspci -n -d ${VD} 2>/dev/null | cut -d " " -f1)

    if [[ $BDF =~ $BDF_REGEX ]]; then

        BDF="0000:${BDF}"

    elif [[ ! $BDF =~ $DBDF_REGEX ]]; then

        echo "Error: Unable to find Domain:Bus:Device.Function for Vendor:Device ${VD}" 1>&2

        exit 1

    fi

fi

TARGET_DEV_SYSFS_PATH="/sys/bus/pci/devices/$BDF"

if [[ ! -d $TARGET_DEV_SYSFS_PATH ]]; then

    echo "Error: Device ${BDF} does not exist, unable to bind device" 1>&2

    exit 1

fi

if [[ ! -d "$TARGET_DEV_SYSFS_PATH/iommu/" ]]; then

    echo "Error: No signs of an IOMMU. Check your hardware and/or linux cmdline parameters. Use intel_iommu=on or iommu=pt iommu=1" 1>&2

    exit 1

fi

# validate that the correct Vendor:Device was found for this BDF

if [[ ! -z $VD ]]; then

    if [[ $(lspci -n -s ${BDF} -d ${VD} 2>/dev/null | wc -l) -eq 0 ]]; then

        echo "Error: Vendor:Device ${VD} not found at ${BDF}, unable to bind device" 1>&2

        exit 1

    else

        echo "Vendor:Device ${VD} found at ${BDF}"

    fi

else

    echo "Warning: You did not specify a Vendor:Device (vvvv:dddd), unable to validate ${BDF}" 1>&2

fi

unset dev_sysfs_paths

for dsp in $TARGET_DEV_SYSFS_PATH/iommu_group/devices/*

do

    dbdf=${dsp##*/}

    if [[ $(( 0x$(setpci -s $dbdf 0e.b) & 0x7f )) -eq 0 ]]; then

        dev_sysfs_paths+=( $dsp )

    fi

done

printf "\nIOMMU group members (sans bridges):\n"

for dsp in ${dev_sysfs_paths[@]}; do echo $dsp; done

modprobe -i vfio-pci

if [[ $? -ne 0 ]]; then

    echo "Error: Error probing vfio-pci" 1>&2

    exit 1

fi

printf "\nBinding...\n"

for dsp in ${dev_sysfs_paths[@]}

do

    dpath="$dsp/driver"

    dbdf=${dsp##*/}

    echo "vfio-pci" > "$dsp/driver_override"

    if [[ -d $dpath ]]; then

        curr_driver=$(readlink $dpath)

        curr_driver=${curr_driver##*/}

        if [[ "$curr_driver" == "vfio-pci" ]]; then

            echo "$dbdf already bound to vfio-pci"

            continue

        else

            echo $dbdf > "$dpath/unbind"

            echo "Unbound $dbdf from $curr_driver"

        fi

    fi

    echo $dbdf > /sys/bus/pci/drivers_probe

done

printf "\n"

# Adjust group ownership

iommu_group=$(readlink $TARGET_DEV_SYSFS_PATH/iommu_group)

iommu_group=${iommu_group##*/}

chown $SUDO_UID:$SUDO_GID "/dev/vfio/$iommu_group"

if [[ $? -ne 0 ]]; then

    echo "Error: unable to adjust group ownership of /dev/vfio/${iommu_group}" 1>&2

    exit 1

fi

printf "success...\n\n"

echo "Device ${VD} at ${BDF} bound to vfio-pci"

echo 'Devices listed in /sys/bus/pci/drivers/vfio-pci:'

ls -l /sys/bus/pci/drivers/vfio-pci | egrep [[:xdigit:]]{4}:

printf "\nls -l /dev/vfio/\n"

ls -l /dev/vfio/
```

just for reference here is my lspci -nnk *before* i run that script:

```
 **condensed to the important parts**

01:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201] (rev 03)

        Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201]

        Kernel driver in use: nvme

02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)

        Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]

        Kernel driver in use: ahci

09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)

        Subsystem: Sapphire Technology Limited Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1da2:e411]

        Kernel driver in use: vfio-pci

09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

        Kernel driver in use: vfio-pci

0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X] [1002:6810]

        Subsystem: Gigabyte Technology Co., Ltd Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X] [1458:227d]

        Kernel driver in use: amdgpu

0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] [1002:aab0]

        Subsystem: Gigabyte Technology Co., Ltd Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] [1458:aab0]

        Kernel driver in use: snd_hda_intel

0b:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201] (rev 03)

        Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201]

        Kernel driver in use: nvme

0d:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) HD Audio Controller [1043:8723]

        Kernel driver in use: snd_hda_intel

```

You can see that so far just the 5700xt video and audio are attached to vfio-pci but after I run the script:

```
  ** condensed to the important parts **

01:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201] (rev 03)

        Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201]

        Kernel driver in use: vfio-pci

02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)

        Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]

        Kernel driver in use: vfio-pci

09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)

        Subsystem: Sapphire Technology Limited Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1da2:e411]

        Kernel driver in use: vfio-pci

09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

        Kernel driver in use: vfio-pci

0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X] [1002:6810]

        Subsystem: Gigabyte Technology Co., Ltd Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X] [1458:227d]

        Kernel driver in use: amdgpu

0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] [1002:aab0]

        Subsystem: Gigabyte Technology Co., Ltd Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] [1458:aab0]

        Kernel driver in use: snd_hda_intel

0b:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201] (rev 03)

        Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive [1cc1:8201]

        Kernel driver in use: nvme

0d:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

        Subsystem: ASUSTeK Computer Inc. Family 17h (Models 00h-0fh) HD Audio Controller [1043:8723]

        Kernel driver in use: vfio-pci

```

Now you can see that one of the nvmes, the HD audio, and the sata controller are attached to vfio-pci.  The only thing left to do is configure qemu/kvm/virt-manager by adding these devices as PCI devices.  You should follow one of the guides that tells you to install windows (assuming thats what youre doing) with the windows CD .iso and the red hat virtio drivers .iso.  I had to first set up my drives one as SATA and one as SCSI, install windows, tell it about the drivers, or reboot into safe mode after install so that it will accept the SCSI drivers & reboot back to normal before you can use them. Then you can set them to virtio and/or do pci passthrough nvme / sata controllers /drives like i'm doing. 

I've noticed that I can just pass through an nvme drive that already had windows installed on it and it will work but it will be a little buggy, it's better when you install a fresh copy of windows onto a drive and use the virtio drivers, hugepages, cpu pinning ,whatever you want to do.

I also have two monitors, both have multiple inputs so I have both of them hooked up to both GPUs and I can select between inputs to see the VM or the host at the same time. I also pass through a USB mouse and keyboard.

Tried to be as clear as I could, if you have any questions I can try to answer them

-----

override patch: place this in /etc/portage/patches/sys-kernel/gentoo-sources-5.4.60/override_acs_caps.patch adjust for your kernel source / version.. i would definitely only use this if necessary. you might not need it on your board.

```
--- a/drivers/pci/quirks.c      2020-05-12 20:48:40.153152132 +0300

+++ b/drivers/pci/quirks.c      2020-05-13 15:59:18.477619314 +0300

@@ -4695,6 +4695,111 @@

                PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF);

 }

 

+/*

+* PCIe ACS Override

+*/

+static bool acs_on_downstream;

+static bool acs_on_multifunction;

+

+#define NUM_ACS_IDS 16

+struct acs_on_id {

+       unsigned short vendor;

+       unsigned short device;

+};

+static struct acs_on_id acs_on_ids[NUM_ACS_IDS];

+static u8 max_acs_id;

+

+static __init int pcie_acs_override_setup(char *p)

+{

+       if (!p)

+               return -EINVAL;

+

+       while (*p) {

+               if (!strncmp(p, "downstream", 10))

+                       acs_on_downstream = true;

+               if (!strncmp(p, "multifunction", 13))

+                       acs_on_multifunction = true;

+               if (!strncmp(p, "id:", 3)) {

+                       char opt[5];

+                       int ret;

+                       long val;

+

+                       if (max_acs_id >= NUM_ACS_IDS - 1) {

+                               pr_warn("Out of PCIe ACS override slots (%d)\n",

+                                       NUM_ACS_IDS);

+                               goto next;

+                       }

+

+                       p += 3;

+                       snprintf(opt, 5, "%s", p);

+                       ret = kstrtol(opt, 16, &val);

+                       if (ret) {

+                               pr_warn("PCIe ACS ID parse error %d\n", ret);

+                               goto next;

+                       }

+                       acs_on_ids[max_acs_id].vendor = val;

+

+                       p += strcspn(p, ":");

+                       if (*p != ':') {

+                               pr_warn("PCIe ACS invalid ID\n");

+                               goto next;

+                       }

+

+                       p++;

+                       snprintf(opt, 5, "%s", p);

+                       ret = kstrtol(opt, 16, &val);

+                       if (ret) {

+                               pr_warn("PCIe ACS ID parse error %d\n", ret);

+                               goto next;

+                       }

+                       acs_on_ids[max_acs_id].device = val;

+                       max_acs_id++;

+               }

+next:

+               p += strcspn(p, ",");

+               if (*p == ',')

+                       p++;

+       }

+

+       if (acs_on_downstream || acs_on_multifunction || max_acs_id)

+               pr_warn("Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA\n");

+

+       return 0;

+}

+

+early_param("pcie_acs_override", pcie_acs_override_setup);

+

+static int pcie_acs_overrides(struct pci_dev *dev, u16 acs_flags)

+{

+       int i;

+

+       /* Never override ACS for legacy devices or devices with ACS caps */

+       if (!pci_is_pcie(dev) ||

+           pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS))

+               return -ENOTTY;

+

+       for (i = 0; i < max_acs_id; i++)

+               if (acs_on_ids[i].vendor == dev->vendor &&

+                   acs_on_ids[i].device == dev->device)

+                       return 1;

+

+       switch (pci_pcie_type(dev)) {

+       case PCI_EXP_TYPE_DOWNSTREAM:

+       case PCI_EXP_TYPE_ROOT_PORT:

+               if (acs_on_downstream)

+                       return 1;

+               break;

+       case PCI_EXP_TYPE_ENDPOINT:

+       case PCI_EXP_TYPE_UPSTREAM:

+       case PCI_EXP_TYPE_LEG_END:

+       case PCI_EXP_TYPE_RC_END:

+               if (acs_on_multifunction && dev->multifunction)

+                       return 1;

+       }

+

+       return -ENOTTY;

+}

+

 static const struct pci_dev_acs_enabled {

        u16 vendor;

        u16 device;

@@ -4797,6 +4902,8 @@

        { PCI_VENDOR_ID_ZHAOXIN, 0x9083, pci_quirk_mf_endpoint_acs },

        /* Zhaoxin Root/Downstream Ports */

        { PCI_VENDOR_ID_ZHAOXIN, PCI_ANY_ID, pci_quirk_zhaoxin_pcie_ports_acs },

+       /* IOMMU ACS override patch */

+       { PCI_ANY_ID, PCI_ANY_ID, pcie_acs_overrides },

        { 0 }

 };

```

then you can just compile the kernel sources and itll patch.

----------

## austinramsay

 *Anon-E-moose wrote:*   

> Did a little googling on navi's, linux and qemu, there is some problem going on with it.
> 
> Also what kernel version are you running.
> 
> Edit to remove silly questions that were answered in first post  
> ...

 

There sure is.. seems like I can't catch a break with this setup haha. This was with kernel 5.8.10. I applied the Navi reset kernel patch which did allow me to reboot the VM (despite some interesting artifacts at the spinning dots at Windows boot time), however I continued to get the error 43 unfortunately. I ended up just dual-booting considering how many hardware issues and software issues there were to get this working. Even if I got it working, I'd have to constantly deal with those issues and it wouldn't be worth it. I hope future updates and the next gen Radeon cards don't have these issues because I'd love to come back to this.

----------

## austinramsay

 *x90e wrote:*   

> ...

 

That's a brilliant response with tons of information. While this won't be usable for me at this moment since I decided to just dual-boot considering this bad combination of hardware AND software issues for now.. I will definitely be coming back to this in the future with different hardware and with new updates that come along. 

Also, just in general this is a great post to put out there for others looking for this info. You'll definitely make someones day with that if not myself soon lol.

----------

