# [solved] amdgpu fails to resume from suspend

## amaroc

There is an open bug at bugzilla.kernel.org that is related to this topic. However, it starts to get annoying as I don't see any effort anywhere to fix this.

Status: 

My Gentoo PC (Ryzen 1700, RX550, Polaris 12, gfx804) worked pretty perfect even for suspend/resume. This was till kernel 5.2.x has been installed where resume sporadically ended in a non-working monitor and I had to SysRQ the system. Somebody with an Intel CPU and a comparable graphics card (Polaris 10, gfx803) on Arch had the same problem and opened the bug on kernel.org. I chimed in there to second this.

There was a patch mentioned that went into 5.2.11 that didn't help. My hope was on 5.3.0 but it failed again. So I reverted back first to 5.1.x and now to LTS 4.19. However, this should only be a temporary solution.

The reason I post here - does anybody has an idea what else I can do to go back to the latest kernel tree? I recall from the past w/ i915 that unloading/loading kernel modules on STR helped but that was probably before kernel drm. I'm not sure whether this would have any success and still I would call it a workaround and not a solution. 

All logs can be found at the Bug 204241 mentioned above but some info given here:

Failure message 5.2.x 

```
amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring gfx test failed (-110)
```

kernel 5.3.0

```
amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring sdma0 test failed (-110)
```

The result is the same  :Sad: 

emerge --info from current system (and working kernel)

```
Portage 2.3.69 (python 3.6.5-final-0, default/linux/amd64/17.1/desktop/plasma, gcc-8.3.0, glibc-2.29-r2, 4.19.74-gentoo x86_64)

=================================================================

System uname: Linux-4.19.74-gentoo-x86_64-AMD_Ryzen_7_1700_Eight-Core_Processor-with-gentoo-2.6

KiB Mem:    16329108 total,   2515668 free

KiB Swap:   12582908 total,  12109192 free

Timestamp of repository gentoo: Sun, 22 Sep 2019 14:15:01 +0000

Head commit of repository gentoo: e769cc5a9b4ed8d835cdec8a2391eb3ccf7d61e9

sh bash 4.4_p23-r1

ld GNU ld (Gentoo 2.32 p2) 2.32.0

app-shells/bash:          4.4_p23-r1::gentoo

dev-java/java-config:     2.2.0-r4::gentoo

dev-lang/perl:            5.28.2-r1::gentoo

dev-lang/python:          2.7.15::gentoo, 3.6.5::gentoo

dev-util/cmake:           3.14.6::gentoo

dev-util/pkgconfig:       0.29.2::gentoo

sys-apps/baselayout:      2.6-r1::gentoo

sys-apps/openrc:          0.41.2::gentoo

sys-apps/sandbox:         2.13::gentoo

sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r4::gentoo

sys-devel/automake:       1.16.1-r1::gentoo

sys-devel/binutils:       2.32-r1::gentoo

sys-devel/gcc:            8.3.0-r1::gentoo

sys-devel/gcc-config:     2.0::gentoo

sys-devel/libtool:        2.4.6-r3::gentoo

sys-devel/make:           4.2.1-r4::gentoo

sys-kernel/linux-headers: 4.19::gentoo (virtual/os-headers)

sys-libs/glibc:           2.29-r2::gentoo

Repositories:

gentoo

    location: /usr/portage

    sync-type: rsync

    sync-uri: rsync://rsync.de.gentoo.org/gentoo-portage

    priority: -1000

    sync-rsync-verify-metamanifest: yes

    sync-rsync-verify-max-age: 24

    sync-rsync-extra-opts: 

    sync-rsync-verify-jobs: 1

my_local_overlay

    location: /usr/local/portage

    masters: gentoo

    priority: 0

mv

    location: /var/lib/layman/mv

    masters: gentoo

    priority: 50

science

    location: /var/lib/layman/science

    masters: gentoo

    priority: 50

stefantalpalaru

    location: /var/lib/layman/stefantalpalaru

    masters: gentoo

    priority: 50

ACCEPT_KEYWORDS="amd64"

ACCEPT_LICENSE="*"

CBUILD="x86_64-pc-linux-gnu"

CFLAGS="-march=znver1 -O2 -pipe"

CHOST="x86_64-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/themes/oxygen-gtk/gtk-2.0"

CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/vmware-installer"

CXXFLAGS="-march=znver1 -O2 -pipe"

DISTDIR="/usr/portage/distfiles"

ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"

FCFLAGS="-O2 -pipe"

FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"

FFLAGS="-O2 -pipe"

GENTOO_MIRRORS="http://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/ rsync://mirror.eu.oneandone.net/gentoo/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://mirror.netcologne.de/gentoo/ http://mirror.netcologne.de/gentoo/ rsync://mirror.netcologne.de/gentoo/ http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ http://ftp.halifax.rwth-aachen.de/gentoo/ ftp://ftp.halifax.rwth-aachen.de/gentoo/ rsync://ftp.halifax.rwth-aachen.de/gentoo/ http://ftp.fau.de/gentoo ftp://ftp.fau.de/gentoo rsync://ftp.fau.de/gentoo http://ftp-stud.hs-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp-stud.hs-esslingen.de/pub/Mirrors/gentoo/ rsync://ftp-stud.hs-esslingen.de/gentoo/"

LANG="de_DE.UTF-8"

LDFLAGS="-Wl,-O1 -Wl,--as-needed"

LINGUAS="de"

MAKEOPTS="-j9"

PKGDIR="/usr/portage/packages"

PORTAGE_CONFIGROOT="/"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"

PORTAGE_TMPDIR="/var/tmp"

USE="X a52 aac acl acpi activities aften alsa alsa-plugin amd64 berkdb bluetooth branding bundled-libs bzip2 cairo cdda cdparanoia cdr cli consolekit coverart crypt css cups cxx daap dbus declarative dri dts dvd dvdr emboss encode exif fam ffmpeg flac fortran gdbm gif gimp glamor gphoto2 gpm gtk iconv icu ipv6 javafx jpeg kde kdesu kipi kwallet lcms ldap libnotify libtirpc lm-sensors mad mng mp3 mp4 mpeg multilib multislot ncurses nls npp nptl nsplugin ogg opengl openmp pam pango pcre pdf phonon plasma pm-utils png policykit ppds pulseaudio qml qt5 quicktime readline sdl seccomp semantic-desktop spell split-usr sql sqlite ssl startup-notification svg tcpd tiff tk truetype udev udisks unicode upower usb video vnc vorbis vpx webkit widgets wmf wxwidgets x264 xattr xcb xcomposite xml xv xvfb xvid zlib" ABI_X86="64 32" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" L10N="de" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby24 ruby25" USERLAND="GNU" VIDEO_CARDS="radeon amdgpu radeonsi vesa fbdev dummy vga" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"

Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
```

Any help or advise how to proceed would be appreciated.Last edited by amaroc on Fri Dec 13, 2019 8:54 pm; edited 1 time in total

----------

## Ant P.

polaris11, same thing here. Usually suspend/resume works, sometimes (1-2 times a month) I get a system where the screen just won't wake up, there's errors in dmesg, and every X11 program is locked up forcing a hard reset. I haven't tried unloading the module since that'd require killing everything using the GPU, and at that point I might as well reboot anyway.

----------

## Goverp

I am the user that tested the fix that's supposed to be in 5.2.11; AFAIK that was specific to STONEY chipsets; its symptoms differ.

That problem was reported on bugs.freedesktop.org.  I've yet to install 5.2.11 to see if it really made its way to the real kernel; I'm about to install 5.3 and try it there.

None of that helps, of course.  It would be worth looking at bugs.freedesktop.org, as that's where the kernel MAINTAINERS file directs problem reports for AMDGPU.  There was no activity on my bug until someone (me) did a git-bisect to locate the failing commit and contacted the authors/maintainers via the mailing list; apparently they don't have time to read bugzilla.  :Sad: 

----------

## amaroc

So it looks like it's not only me and that gives some hope that there will be a solution.

Anyway, many thanks for pointing me to bugs.freedesktop.org. I didn't know that amdgpu kernel context is handled there as well. Anyway, there has been Bug 111848 opened that looks *very* similar. I will continue to look there and report here if progress happens.

----------

## Ant P.

I just had this happen again, so I'm now certain it's the same issue. System responded to a controlled sysrq shutdown, so here's the /var/log/kernel lines:

```
Oct 04 18:15:41 [kernel] ACPI: Waking up from system sleep state S3

(snip non-gpu lines)

Oct 04 18:15:41 [kernel] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).

Oct 04 18:15:41 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring sdma0 test failed (-110)

Oct 04 18:15:41 [kernel] [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <sdma_v3_0> failed -110

Oct 04 18:15:41 [kernel] [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).

Oct 04 18:15:41 [kernel] PM: dpm_run_callback(): pci_pm_resume+0x0/0xc0 returns -110

Oct 04 18:15:41 [kernel] PM: Device 0000:01:00.0 failed to resume async: error -110

(...)

Oct 04 18:15:41 [kernel] [drm] schedsdma0 is not ready, skipping

                - Last output repeated twice -

Oct 04 18:15:41 [kernel] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

Oct 04 18:15:42 [kernel] [drm] schedsdma0 is not ready, skipping

                - Last output repeated twice -

Oct 04 18:15:52 [kernel] [drm:amdgpu_job_timedout] *ERROR* ring sdma1 timeout, signaled seq=2194658, emitted seq=2194660

Oct 04 18:15:52 [kernel] [drm:amdgpu_job_timedout] *ERROR* Process information: process  pid 0 thread  pid 0

Oct 04 18:15:52 [kernel] amdgpu 0000:01:00.0: GPU reset begin!

Oct 04 18:15:52 [kernel] ------------[ cut here ]------------

Oct 04 18:15:52 [kernel] WARNING: CPU: 0 PID: 2355 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:891 dm_suspend.cold+0xc/0x13

Oct 04 18:15:52 [kernel] Modules linked in: snd_usb_audio snd_usbmidi_lib hid_sony snd_rawmidi snd_seq_dummy snd_seq snd_seq_device fuse hid_uclogic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core sr_mod hid_microsoft snd_pcm cdc_acm cdrom i2c_piix4

Oct 04 18:15:52 [kernel] CPU: 0 PID: 2355 Comm: kworker/0:3 Not tainted 5.2.16-zen-33525-g25665dec9 #41

Oct 04 18:15:52 [kernel] Hardware name: Gigabyte Technology Co., Ltd. GA-MA770-UD3/GA-MA770-UD3, BIOS FGc 11/19/2009

Oct 04 18:15:52 [kernel] Workqueue: events drm_sched_job_timedout

Oct 04 18:15:52 [kernel] RIP: 0010:dm_suspend.cold+0xc/0x13

Oct 04 18:15:52 [kernel] Code: f0 5b 20 82 e8 de 4e ed ff e9 73 ff ff ff 48 89 df e8 51 f5 ff ff 90 eb b5 e8 29 c0 9c ff 48 c7 c7 00 44 19 82 e8 94 c0 a0 ff <0f> 0b e9 e8 84 ff ff 48 c7 c7 00 44 19 82 e8 81 c0 a0 ff 0f 0b 5d

Oct 04 18:15:52 [kernel] RSP: 0018:ffffc90002227d28 EFLAGS: 00010246

Oct 04 18:15:52 [kernel] RAX: 0000000000000024 RBX: ffff888107ea0000 RCX: 0000000000000006

Oct 04 18:15:52 [kernel] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8881c9a117d0

Oct 04 18:15:52 [kernel] RBP: ffff888107eae968 R08: 0000000000000794 R09: 0000000000000001

Oct 04 18:15:52 [kernel] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107ea0000

Oct 04 18:15:52 [kernel] R13: 00000000000000b8 R14: 0000000000000000 R15: 0000000000000000

Oct 04 18:15:52 [kernel] FS:  0000000000000000(0000) GS:ffff8881c9a00000(0000) knlGS:0000000000000000

Oct 04 18:15:52 [kernel] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Oct 04 18:15:52 [kernel] CR2: 00007f020326c480 CR3: 00000001b90f3000 CR4: 00000000000006f0

Oct 04 18:15:52 [kernel] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000046cfc000

Oct 04 18:15:52 [kernel] DR3: 000000000000f000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Oct 04 18:15:52 [kernel] Call Trace:

Oct 04 18:15:52 [kernel]  amdgpu_device_ip_suspend_phase1+0x85/0xc0

Oct 04 18:15:52 [kernel]  amdgpu_device_ip_suspend+0x17/0x60

Oct 04 18:15:52 [kernel]  amdgpu_device_pre_asic_reset+0x1ef/0x204

Oct 04 18:15:52 [kernel]  amdgpu_device_gpu_recover+0x72/0x783

Oct 04 18:15:52 [kernel]  amdgpu_job_timedout+0xf2/0x120

Oct 04 18:15:52 [kernel]  drm_sched_job_timedout+0x35/0x60

Oct 04 18:15:52 [kernel]  process_one_work+0x1c8/0x400

Oct 04 18:15:52 [kernel]  worker_thread+0x45/0x3e0

Oct 04 18:15:52 [kernel]  kthread+0xf6/0x140

Oct 04 18:15:52 [kernel]  ? trace_event_raw_event_workqueue_execute_start+0xc0/0xc0

Oct 04 18:15:52 [kernel]  ? kthread_create_on_node+0x60/0x60

Oct 04 18:15:52 [kernel]  ret_from_fork+0x22/0x40

Oct 04 18:15:52 [kernel] ---[ end trace b54479be23f2beea ]---

Oct 04 18:15:52 [kernel] amdgpu: [powerplay] Trying to disable SCLK DPM when DPM is disabled

Oct 04 18:15:52 [kernel] amdgpu: [powerplay] Trying to disable voltage DPM when DPM is disabled

Oct 04 18:15:52 [kernel] amdgpu: [powerplay] Failed to force to switch arbf0!

Oct 04 18:15:52 [kernel] amdgpu: [powerplay] [disable_dpm_tasks] Failed to disable DPM!

Oct 04 18:15:52 [kernel] [drm:amdgpu_device_ip_suspend_phase2] *ERROR* suspend of IP block <powerplay> failed -22

Oct 04 18:15:52 [kernel] amdgpu 0000:01:00.0: GPU pci config reset

Oct 04 18:15:52 [kernel] amdgpu 0000:01:00.0: GPU reset succeeded, trying to resume

Oct 04 18:15:52 [kernel] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).

Oct 04 18:15:52 [kernel] [drm] VRAM is lost due to GPU reset!

Oct 04 18:15:53 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.0.0 test failed (-110)

Oct 04 18:15:53 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.1.0 test failed (-110)

Oct 04 18:15:53 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.2.0 test failed (-110)

Oct 04 18:15:53 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.3.0 test failed (-110)

Oct 04 18:15:53 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.0.1 test failed (-110)

Oct 04 18:15:54 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.1.1 test failed (-110)

Oct 04 18:15:54 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.2.1 test failed (-110)

Oct 04 18:15:54 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring comp_1.3.1 test failed (-110)

Oct 04 18:15:54 [kernel] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring sdma0 test failed (-110)

Oct 04 18:15:54 [kernel] [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <sdma_v3_0> failed -110

Oct 04 18:15:54 [kernel] [drm] Skip scheduling IBs!

                - Last output repeated 106 times -

Oct 04 18:15:54 [kernel] [drm] schedsdma0 is not ready, skipping

Oct 04 18:15:54 [kernel] [drm] Skip scheduling IBs!

                - Last output repeated 11 times -

Oct 04 18:15:54 [kernel] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!

Oct 04 18:15:54 [kernel] amdgpu 0000:01:00.0: GPU reset(1) failed

Oct 04 18:15:54 [kernel] amdgpu 0000:01:00.0: GPU reset end with ret = -110

Oct 04 18:15:54 [kernel] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!

                - Last output repeated 3 times -

Oct 04 18:16:04 [kernel] [drm:amdgpu_job_timedout] *ERROR* ring sdma1 timeout, signaled seq=2194660, emitted seq=2194662

Oct 04 18:16:04 [kernel] [drm:amdgpu_job_timedout] *ERROR* Process information: process  pid 0 thread  pid 0

Oct 04 18:16:04 [kernel] amdgpu 0000:01:00.0: GPU reset begin!
```

At that point, the screen still hasn't turned on so there's nothing left to do but force a reboot.

----------

## amaroc

There seems to be progress. Please see the original BUG 204241. There has been a patch provided that seems to work ok. It might be a bit early to state the issue is gone as the failure appeared sporadically after some days of usage. 

Anyway, some more testing on different hardware might help.

PATCH

My configuration:

kernel: 5.3.2-gentoo

x11-drivers/xf86-video-amdgpu:  19.0.1

media-libs/mesa: 19.1.7

x11-base/xorg-server: 1.20.5

----------

## mirekm

I had similar problem with nvme drive.

The culrpit was kernel setting of

```
CONFIG_PCIEASPM_*
```

I had Preformance setting, and it lead to problem after resume.

I found, that change of this parameter to 

```
CONFIG_PCIEASPM_DEFAULT
```

solved problem.

----------

## amaroc

Update: BUG 204241 still sees some conversation and issues on different hardware. However, for me the issue is gone with the mentioned patch for uvd6. This patch is part of the kernel since 5.3.8.

----------

