# Random hangups, ACPI warnings, cannot read voltage sensors

## Kyeno

Hello,

After few days of trying to nail the problem myself (with a bit of Google's support) I believe I've ultimately hit the wall and thus I'm condemned to ask for help.

I've recently slightly upgraded hardware in my home server (router, firewall, masq, samba, webdev+experiments) from some "garage parts". It currently runs:

- Intel DP35DP Motherboard with Intel P35 chipset (ICH9R)

- Intel Core2 Quad Q8200S 4x2.33GHz low TDP

- 3GB DDR2 667MHz Dual-Channel Asymmetric (2x1GB + 1GB)

- 3Com 3c905B 100mbit/s network card for Internet (eth0)

- Broadcom NetXtreme BCM5782 1gbit/s network card for LAN (eth1)

- S3 ViRGE PCI for console display and virtual framebuffer for headless browser experiments

- 2x500GB HDD3.5" SATA drives with 3 mdadm-based RAID1 mirror arrays

- Pretty much random 400WATT PSU with passive PFC

I've updated the BIOS to the last-recent one:

```
[    0.000000] NX (Execute Disable) protection: active

[    0.000000] SMBIOS 2.4 present.

[    0.000000] DMI:                  /DP35DP, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
```

Every integrated peripherals and almost everything else in BIOS is disabled (except NX bit for safety and USB Legacy for keyboard in grub). UEFI boot is disabled. SATA is driven in Native+AHCI mode.

The system alone was born in mid-2008 or 2009 I think and still shares some parts from that time. ;)

It's a 32bit Gentoo (x86) with hardened sources (used to run hardened profile while it still was available) currently running Linux 4.8.17-hardened-r2 with some (but not all) PaX/grsec restrictions enabled.

It's an OpenRC-based machine with sys-apps/systemd being masked by hand. Also >sys-boot/grub-0.98 being masked by hand, so it's pure grub-legacy. Let's say I like it oldschool to certain degree. :) And now for the problems; I'm mentioning them all-in-one post, since I somehow believe they might be interconnected.

Problem #1

The system just freezes every now and then with no warnings, no errors in dmesg or syslog, nothing. I can't tell *precisely* how to reproduce it or what exactly causes it. Sometimes it's able to recompile the world without an issue, other day it dies within an hour. Often I wake up to find my home internet down and the machine being unresponsive. I had theories it's being RAM consumption... or excessive harddrive usage. I even replaced one harddrive (which yielded no SMART errors at all, but just sounded strange); mdadm resync took ~3 hangups there too and was only able to complete when I passed 

```
memtest=17 nofb acpi_enforce_resources=lax
```

 to the kernel params. Still passing them I noticed a *significant* performance drop of the entire system; and not only during the boot time.

I tried to memtest the machine, but surprisingly booting sys-apps/memtest86-4.3.7 causes instant reboot. Same goes for booting sys-apps/memtest86+-5.01-r2. Only sys-apps/memtest86+-4.20-r1 was able to boot without an issue and I could do the entire test (actually almost 2 tests) without a single problem with the memory.

*edit*

And yep, I tried all gcc versions I have; both in standard (hardened) and in vanilla mode for building all those memtests.

Problem #2

The ACPI alone yields few warnings during the boot time. I was trying really hard to get some more reads than just "coretemp" (which I believe is lying anyway; it's almost impossible for this CPU to run around 20'C IDLE on a stock Intel cooler) and likely messed the SMBUS modules a bit. Can't really find what I should do for ICH9R there to read f.e. voltages. To be precise, some weird dmesg messages:

```

[    0.011076] Enabling APIC mode:  Flat.  Using 1 I/O APICs

[    0.012000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1

[    0.022000] APIC calibration not consistent with PM-Timer: 139ms instead of 100ms

[    0.022000] APIC delta adjusted to PM-Timer: 2083091 (2916301)

[    0.022000] smpboot: CPU0: Intel(R) Core(TM)2 Quad CPU    Q8200  @ 2.33GHz (family: 0x6, model: 0x17, stepping: 0xa)

[    0.022000] Performance Events: PEBS fmt0+, Core2 events, 4-deep LBR, Intel PMU driver.

...cut...

[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)

[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)

[    0.000000] ACPI: IRQ0 used by override.

[    0.000000] ACPI: IRQ9 used by override.

...cut...

[    0.230393] pci 0000:00:1f.0: quirk: [io  0x0400-0x047f] claimed by ICH6 ACPI/GPIO/TCO

[    0.230397] pci 0000:00:1f.0: quirk: [io  0x0500-0x053f] claimed by ICH6 GPIO

[    0.230399] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0680 (mask 007f)

[    0.230402] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 2 PIO at 0810 (mask 007f)

...cut...

[    0.232105] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 *9 10 11 12)

[    0.232185] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 9 10 *11 12)

[    0.232264] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 *10 11 12)

[    0.232342] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 9 *10 11 12)

[    0.232420] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.

[    0.232499] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 9 10 *11 12)

[    0.232579] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 9 10 *11 12)

[    0.232657] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 *9 10 11 12)

[    0.233273] ACPI: Enabled 3 GPEs in block 00 to 3F

...cut...

[    0.240233] pnp: PnP ACPI: found 5 devices

[    0.286329] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns

[    0.286342] pci 0000:07:00.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window

[    0.286344] pci 0000:07:01.0: can't claim BAR 6 [mem 0xfffe0000-0xffffffff pref]: no compatible bridge window

[    0.286346] pci 0000:07:02.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window

...cut...

[    6.160428] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x000000000000050C-0x000000000000050F (\IGPO) (20160422/utaddress-255)

[    6.160435] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

[    6.251512] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt

...cut...

[    7.051596] i2c i2c-1: unable to read EDID block.

[    7.176564] i2c i2c-1: unable to read EDID block.

[    7.301111] i2c i2c-1: unable to read EDID block.

```

After reading some threads about ACPI here, on StackExchange, etc. I didn't find anyone with my problem precisely. As mentioned before, my entire adventure with it began as I wanted to read some voltages from this PSU and couldn't really do it with lm_sensors. sensors-detect only goes with coretemp.

*edit*

Ok, I figured out (from sensors-detect) that i2c i2c-1 is the S3 ViRGE card; I guess reading EDID block from it isn't that important for stability, hopefully.

*edit2*

I have also slightly updated the kernel to support GPIO, more smbus things and some missing drivers. The lists below are also updated:

```
# sensors

coretemp-isa-0000

Adapter: ISA adapter

Core 0:       +21.0 C  (high = +59.0 C, crit = +85.0 C)

Core 1:       +20.0 C  (high = +59.0 C, crit = +85.0 C)

Core 2:       +28.0 C  (high = +59.0 C, crit = +85.0 C)

Core 3:       +28.0 C  (high = +59.0 C, crit = +85.0 C)

# lsmod

Module                  Size  Used by

xt_nat                  1517  4

ipt_MASQUERADE           945  1

iptable_nat             1234  1

s3fb                   20181  0

cfbfillrect             3238  1 s3fb

cfbimgblt               2183  1 s3fb

vgastate                8025  1 s3fb

cfbcopyarea             3062  1 s3fb

i2c_algo_bit            4564  1 s3fb

fb_ddc                  1225  1 s3fb

svgalib                 7341  1 s3fb

fb                     37174  1 s3fb

mousedev                9256  0

fbdev                    706  1 fb

gpio_ich                3707  0

i2c_i801               10976  0

lpc_ich                13068  0

mfd_core                4319  1 lpc_ich

i2c_smbus               2477  1 i2c_i801

evdev                  13447  0

coretemp                3558  0

# cat /etc/conf.d/lm_sensors

LOADMODULES=yes

INITSENSORS=yes

HWMON_MODULES="coretemp i2c-dev i2c-i801"

MODULE_0=coretemp

MODULE_1=i2c-dev

MODULE_2=i2c-i801

# lspci -k

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02)

00:03.0 Communication controller: Intel Corporation 82G33/G31/P35/P31 Express MEI Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: ehci-pci

00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)

00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 02)

00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 02)

00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 02)

00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)

00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: ehci-pci

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)

00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: lpc_ich

   Kernel modules: lpc_ich

00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: i801_smbus

   Kernel modules: i2c_i801

03:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101/6102 single-port PATA133 interface (rev b2)

   Subsystem: Marvell Technology Group Ltd. 88SE6101/6102 single-port PATA133 interface

   Kernel driver in use: pata_marvell

07:00.0 VGA compatible controller: S3 Graphics Ltd. 86c325 [ViRGE] (rev 06)

   Kernel driver in use: s3fb

   Kernel modules: s3fb

07:01.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 64)

   Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100

   Kernel driver in use: 3c59x

07:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5782 Gigabit Ethernet (rev 03)

   Subsystem: Hewlett-Packard Company NetXtreme BCM5782 Gigabit Ethernet

   Kernel driver in use: tg3

```

At the very moment I'm experimenting with processor.nocst=1 kernel parameter, but I really feel like I'm hitting the wall hard here.

Please let me know if you need any extra information and thanks in advance!

----------

## Kyeno

UPDATE:

I have compiled in most of the hardware drivers for this machine. Even when I don't need them - just added them in so the IRQ mapper "knows what it's doing" (at least those were my thoughts).

```
# lspci -k

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02)

   Kernel driver in use: pcieport

00:03.0 Communication controller: Intel Corporation 82G33/G31/P35/P31 Express MEI Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: ehci-pci

00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)

   Kernel driver in use: pcieport

00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 02)

   Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 02)

   Kernel driver in use: pcieport

00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 02)

   Kernel driver in use: pcieport

00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)

   Kernel driver in use: pcieport

00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: uhci_hcd

00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: ehci-pci

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)

00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: lpc_ich

   Kernel modules: lpc_ich

00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)

   Subsystem: Intel Corporation Desktop Board DP35DP

   Kernel driver in use: i801_smbus

   Kernel modules: i2c_i801

03:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101/6102 single-port PATA133 interface (rev b2)

   Subsystem: Marvell Technology Group Ltd. 88SE6101/6102 single-port PATA133 interface

   Kernel driver in use: pata_marvell

07:00.0 VGA compatible controller: S3 Graphics Ltd. 86c325 [ViRGE] (rev 06)

   Kernel driver in use: s3fb

   Kernel modules: s3fb

07:01.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 64)

   Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100

   Kernel driver in use: 3c59x

07:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5782 Gigabit Ethernet (rev 03)

   Subsystem: Hewlett-Packard Company NetXtreme BCM5782 Gigabit Ethernet

   Kernel driver in use: tg3
```

Still can't find any module for "00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)". Is there any? Do you think I'm missing some more?

I still have those warnings in dmesg:

```
[    6.264275] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x000000000000050C-0x000000000000050F (\IGPO) (20160422/utaddress-255)

[    6.264282] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
```

When I boot with "acpi_enforce_resources=lax", this warning also gives a 3rd message saying this might cause system instability (I will paste exact text once I reboot with this).

Also I still cannot get any voltage, fan or MCH+ICH readings; just coretemp. I even went "the desperate way" and compiled every possible Hardware Monitoring and pwmbus driver as module, hoping it will detect something...

```
# yes | sensors-detect

...cut...

Driver `coretemp':

  * Chip `Intel digital thermal sensor' (confidence: 9)

Do you want to overwrite '/etc/modules-load.d/lm_sensors.conf'? (yes/NO):

...cut...
```

```
# pwmconfig

...cut...

/usr/sbin/pwmconfig: There are no pwm-capable sensor modules installed
```

```
# acpi -V

No support for device type: power_supply

No support for device type: power_supply

Cooling 0: Processor 0 of 7

Cooling 1: Processor 0 of 7

Cooling 2: Processor 0 of 7

Cooling 3: Processor 0 of 7
```

```
# ls /sys/class/power_supply/

#

# ls /sys/class/hwmon/

hwmon0

# cat /sys/class/hwmon/hwmon0/power/runtime_status

unsupported

# gzip -dc /proc/config.gz|grep -i pwm

# CONFIG_INPUT_PWM_BEEPER is not set

CONFIG_PWM=y

CONFIG_PWM_SYSFS=y

CONFIG_PWM_LPSS=m

CONFIG_PWM_LPSS_PCI=m

CONFIG_PWM_LPSS_PLATFORM=m

# CONFIG_PWM_PCA9685 is not set

# gzip -dc /proc/config.gz|grep -i acpi

# Power management and ACPI options

CONFIG_ACPI=y

CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y

CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y

CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y

# CONFIG_ACPI_DEBUGGER is not set

CONFIG_ACPI_PROCFS_POWER=y

# CONFIG_ACPI_REV_OVERRIDE_POSSIBLE is not set

# CONFIG_ACPI_EC_DEBUGFS is not set

CONFIG_ACPI_AC=y

# CONFIG_ACPI_BATTERY is not set

CONFIG_ACPI_BUTTON=y

CONFIG_ACPI_FAN=y

# CONFIG_ACPI_DOCK is not set

CONFIG_ACPI_CPU_FREQ_PSS=y

CONFIG_ACPI_PROCESSOR_CSTATE=y

CONFIG_ACPI_PROCESSOR_IDLE=y

CONFIG_ACPI_PROCESSOR=y

CONFIG_ACPI_PROCESSOR_AGGREGATOR=y

CONFIG_ACPI_THERMAL=y

# CONFIG_ACPI_CUSTOM_DSDT is not set

CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y

# CONFIG_ACPI_DEBUG is not set

# CONFIG_ACPI_PCI_SLOT is not set

# CONFIG_ACPI_CONTAINER is not set

CONFIG_ACPI_HOTPLUG_IOAPIC=y

# CONFIG_ACPI_SBS is not set

# CONFIG_ACPI_HED is not set

# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set

# CONFIG_ACPI_NFIT is not set

CONFIG_HAVE_ACPI_APEI=y

CONFIG_HAVE_ACPI_APEI_NMI=y

# CONFIG_ACPI_APEI is not set

# CONFIG_ACPI_EXTLOG is not set

# CONFIG_ACPI_CONFIGFS is not set

CONFIG_PNPACPI=y

CONFIG_ATA_ACPI=y

# CONFIG_PATA_ACPI is not set

CONFIG_ACPI_I2C_OPREGION=y

# ACPI drivers

CONFIG_GPIO_ACPI=y

# ACPI drivers

CONFIG_SENSORS_ACPI_POWER=m

# ACPI INT340X thermal drivers

# CONFIG_MFD_INTEL_LPSS_ACPI is not set

CONFIG_DMA_ACPI=y

```

According to motherboard technical specs (http://download.viglen.co.uk/files/Motherboards/DP35DP/Manual/DP35DP_Manual.pdf page 27 and 28) I should be getting quite a lot of readings (as I do in BIOS or "Everest" on Windows as I gave it a brief try)...

Also, according to P35 technical datasheets (http://pdf1.alldatasheet.com/datasheet-pdf/view/394714/INTEL/82P35.html page 368-370) I should be getting quite some readings from it alone...

Again as a proof that my I2C bus actually works (I started thinking that maybe grsec+PaX are blocking it):

```
# modprobe eeprom

# decode-dimms

# decode-dimms version 6231 (2014-02-20 10:54:34 +0100)

Memory Serial Presence Detect Decoder

By Philip Edelbrock, Christian Zuckschwerdt, Burkart Lingner,

Jean Delvare, Trent Piepho and others

Decoding EEPROM: /sys/bus/i2c/drivers/eeprom/0-0050

Guessing DIMM is in                             bank 1

---=== SPD EEPROM Information ===---

EEPROM Checksum of bytes 0-62                   OK (0x76)

# of bytes written to SDRAM EEPROM              128

Total number of bytes in EEPROM                 256

Fundamental Memory type                         DDR2 SDRAM

SPD Revision                                    1.2

---=== Memory Characteristics ===---

Maximum module speed                            666 MHz (PC2-5300)

Size                                            1024 MB

Banks x Rows x Columns x Bits                   4 x 14 x 10 x 64

Ranks                                           2

SDRAM Device Width                              8 bits

Module Height                                   30.0 mm

Module Type                                     UDIMM (133.25 mm)

DRAM Package                                    Planar

Voltage Interface Level                         SSTL 1.8V

Module Configuration Type                       No Parity

Refresh Rate                                    Reduced (7.8 us) - Self Refresh

Supported Burst Lengths                         4, 8

tCL-tRCD-tRP-tRAS                               5-5-5-15

Supported CAS Latencies (tCL)                   5T, 4T, 3T

Minimum Cycle Time                              3.00 ns at CAS 5 (tCK min)

                                                3.75 ns at CAS 4

                                                5.00 ns at CAS 3

Maximum Access Time                             0.45 ns at CAS 5 (tAC)

                                                0.50 ns at CAS 4

                                                0.60 ns at CAS 3

Maximum Cycle Time (tCK max)                    8.00 ns

---=== Timing Parameters ===---

Address/Command Setup Time Before Clock (tIS)   0.20 ns

Address/Command Hold Time After Clock (tIH)     0.27 ns

Data Input Setup Time Before Strobe (tDS)       0.10 ns

Data Input Hold Time After Strobe (tDH)         0.17 ns

Minimum Row Precharge Delay (tRP)               15.00 ns

Minimum Row Active to Row Active Delay (tRRD)   7.50 ns

Minimum RAS# to CAS# Delay (tRCD)               15.00 ns

Minimum RAS# Pulse Width (tRAS)                 45.00 ns

Write Recovery Time (tWR)                       15.00 ns

Minimum Write to Read CMD Delay (tWTR)          7.50 ns

Minimum Read to Pre-charge CMD Delay (tRTP)     7.50 ns

Minimum Active to Auto-refresh Delay (tRC)      60.00 ns

Minimum Recovery Delay (tRFC)                   105.00 ns

Maximum DQS to DQ Skew (tDQSQ)                  0.24 ns

Maximum Read Data Hold Skew (tQHS)              0.34 ns

---=== Manufacturing Information ===---

Manufacturer                                    SK Hynix (former Hyundai Electronics)

Manufacturing Location Code                     0x01

Part Number                                     HYMP512U64CP8-Y5

Manufacturing Date                              2008-W26

Assembly Serial Number                          0x0000300B

...cut...
```

Help, please... I'm lost.

----------

## cboldt

When I've had random failures with no trace in logs, I was able to cure them with memory timing (BIOS), CPU clock timing, and voltages.  Something is being pushed "just a little too hard."   Not a thermal issue either, had good cooling, nothing "hot" inside.

Doesn't help any with your sensors issue, but might get you past the lockups.  I have no more specific suggestions on CPU clock, RAM timing, etc., been so long since I had to play there.  I do recall that any CPU clock or voltage moves I made, were modest.  A few percent.  I had an A7NX8 and was pushing (overclocking) CPU, with pretty good success.  I think I ended up with 25% improvement in speed - but I'd started and ran for years with the thing underclocked.

----------

## Kyeno

Hey - thanks for the tips, @cboldt! Actually everything here runs completely stock, even with slightly under-timed mem (5-5-5-15, while those chips are capable of better timings).

Considering your suggestions I have just ordered Kingston Value Ram 2x2GB DualChannel kit so I can just skip using DualChannel Asymmetric - maybe that config is a bit too harsh for the northbridge indeed; despite the clock.

For the sensors I'm still pretty much lost. I think I'll try Lindows (Ubuntu LiveCD) on this machine and see if it detects anything; then if yes - I'll try to debug it from there. Otherwise I'm not sure anymore where to look or who to contact for support here. Intel support consultant/forums? Kernel support forums?

I have a feeling I have missed something very simple and obvious in the kernel config though. ISA Bus support? (still required? I remember times when I2C actually wanted this; now the kernel says to enable it *only* if you have real ISA slots)

----------

## cboldt

Systemrescuecd boots pretty fast, and is amazingly good at finding hardware.

This forum is as good as most for feedback, but if you want to poke around, a method I've used with some success is google or bing (or whatever) the MoBo plus "sensors" and maybe plus "kernel" or "linux."  Let the search engine do the heavy lifting.  You'll find out which forums are best in that very narrow line of inquiry.

You say the parts are "garage parts," I don't think you meant that literally, but you may have a "loose connection" somewhere.  Take the MoBo apart, RAM and CPU, and put it back together.  It doesn't take much of "the right thing" for a system to lockup.  You may have done that already (reseat separable parts).

Could be too, power quality issues from MoBo part aging (caps), or (less likely) P/S.  All fixable, but not without melting some solder  :Wink: 

----------

## C5ace

My 8 year old desktop occasionally crashed X11. No cause shown in logs. Final cure was to pull video card, clean the connector. Now all OK.

----------

## Kyeno

@cboldt, @C5ace - thanks again for your tips. :)

And well.. I *built* it from garage parts, not just installed them.

Meaning I have inspected all MoBo and PCI card capacitors for being swollen (at least "using the eye and fingers", I didn't unsolder them for precise measurements), I have cleaned, polished and replaced all heatsinks while adding new, pretty much good quality thermal grease (silver-based), I have cleaned all connector pins of RAM modules and extension cards using acetylene based liquids, I have sprayed compressed air on all MoBo extension and RAM sockets, I have cleaned all fans (including PSU) and applied a drop of new rotor grease to each, etc.

For Googling around - I literally spend about 2 days on doing that before even writing 1st post here and bothering you guys. Surprisingly I find very little info about Intel DP35DP or even P35/82P35 sensors on Linux. Try googling for "DP35DP linux sensors" and you'll likely find this very thread on 1st page of results. :)

*edit*

Actually some Japanese site looks quite promising http://www.bondoffamily-net.com/~kinta-chan/ LOL!

*edit2*

http://cateee.net/lkddb/web-lkddb/INTEL_MEI.html getting closer... ;)

----------

## cboldt

Good job on the build, just tossing out possible sources of random failure.

For finding sensors, I'm of a mind that MoBo and chipset searches don't necessarily correspond with sensors.

Found this old article http://lm-sensors.lm-sensors.narkive.com/DntGFWsS/intel-dp35dp-p35-sensor-support that suggests module "coretemp," and some remarks about getting info from the isa bus.

More at http://marc.info/?t=119241816300001&r=1&w=2

----------

## Jaglover

I'd use a real voltmeter to check all voltages on ATX connector.

----------

## Kyeno

@cboldt Thanks a lot and I really appreciate it! I've seen the thread you posted before, no luck with it though.

@Jaglover Did this and they're fine, but that's random ad-hoc testing. My idea was scripting something that would output the voltages (among with most other sensor reads) every n minutes to some log file, so I'd have a chance of getting a bit more idea of what's going on with the hardware on system hangup.

----------

## Kyeno

Victory! (at least for one thing) :)

```
# gzip -dc /proc/config.gz |grep -i mei

CONFIG_INTEL_MEI=m

CONFIG_INTEL_MEI_ME=m

CONFIG_INTEL_MEI_TXE=m
```

```
 # uname -a

Linux home 4.8.17-hardened-r2 #14 SMP Thu Mar 23 14:06:30 CET 2017 i686 Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz GenuineIntel GNU/Linux

# cat /etc/gentoo-release

Gentoo Base System release 2.3

# lsmod

Module                  Size  Used by

xt_nat                  1517  4

ipt_MASQUERADE           945  1

iptable_nat             1266  1

gpio_ich                3719  0

s3fb                   19792  0

cfbfillrect             3210  1 s3fb

cfbimgblt               2183  1 s3fb

vgastate                7873  1 s3fb

cfbcopyarea             3078  1 s3fb

i2c_algo_bit            4592  1 s3fb

fb_ddc                  1225  1 s3fb

svgalib                 7285  1 s3fb

fb                     31498  1 s3fb

i2c_i801               11416  0

fbdev                    706  1 fb

i2c_smbus               2445  1 i2c_i801

lpc_ich                13228  0

mousedev                9064  0

mfd_core                3311  1 lpc_ich

mei_me                 10403  0

mei                    41037  1 mei_me

evdev                  12383  0

coretemp                3558  0
```

```
# wget http://www.bondoffamily-net.com/~kinta-chan/techknow/Linux/hardmon/src/heci-qst2/heci-qst.c

# gcc -D MEI -o heci-qst heci-qst.c

# sensors && ./heci-qst

coretemp-isa-0000

Adapter: ISA adapter

Core 0:       +20.0 C  (high = +59.0 C, crit = +85.0 C)

Core 1:       +20.0 C  (high = +59.0 C, crit = +85.0 C)

Core 2:       +28.0 C  (high = +59.0 C, crit = +85.0 C)

Core 3:       +28.0 C  (high = +59.0 C, crit = +85.0 C)

**heci-qst(Intel MEI Driver Version)**

Try device name [/dev/mei]

Try device name [/dev/mei0]

CPU_Temp 43.00

MB_Temp 31.31

ICH_Temp 48.28

MCH_Temp 66.70

CPU_Fan 1160 RPM

+12_Volts 12.287 V

+5_Volts 5.010 V

+3.3_Volts 3.340 V

MCH_Vccp 1.242 V

CPU1_Vccp 1.106 V
```

A garage-built solution for a garage-built computer. ;)

Now I'll just clean up my kernel (only 14 recompiles so far haha) from useless/experimental stuff and write this periodical logger.

@Gentoo:

By the way - if this solution works on most modern Gentoo-hardened kernel and Intel DP35DP board (and works well, according to Intel specs I pasted few posts above, MCH *supposed* to be reporting 66'C even if it's below it), maybe it makes sense to package this Japanese's guy's code and write a Gentoo Wiki article about getting sensor reads *directly* from Intel chipset?

I can do some more tests for you. I have another computer with Intel DP45SG Extreme (P45) motherboard with C2Extreme QX9650 (multiboot with Gentoo Linux 32bit + KDE Desktop profile) and yet another one with Intel DP55KG Extreme (P55) motherboard with i7 870 (no Gentoo there, but I can get it; maybe test 64bit kernels?).

----------

## cboldt

Just that simple -- Heheheh.  I have a couple of thinkpads, both have modules "mei" and "mei_me" loaded, and I try to pare down my kernels and modules to the bare minimum.  I watch very little sensors-wise, CPU temp and frequency, and that's it.

----------

## Kyeno

I hear you - most of the time I do the same; yet in case of very strange hardware behavior it's probably wise to monitor a bit more. :)

And again I hear you about the kernels; I went on a path against my beliefs trying to get those sensors done here.

I'm kind of wondering though (just for the sake of pure experimentation) if that heci-qst solution would read your thinkpad's sensors since you got the modules already there?

Btw, so far the machine seems stable. Maybe I actually added something useful/previously missing to the kernel. I'll do a bit overnight stress test.

----------

## cboldt

I usually build or try to build so I can see everything, voltages, timings, etc.  Once the challenge is met, I'm done with the adventure  :Wink:   So done, that I forget how to view the stuff.  New adventure with each new MoBo/platform.  About 5 adventures in the last 10 years.

----------

## Kyeno

Haha :)

Btw, I gave a brief try and I somehow cannot get those reads on the Intel DP45SG computer with 3.7 kernel series. I wonder if it's the kernel alone, or the chipset.

```
# inxi -MSC

System:    Host: blue Kernel: 3.7.10-gentoo i686 (32 bit) Console: tty 2 Distro: Gentoo Base System release 2.2

Machine:   Device: desktop Mobo: Intel model: DP45SG v: AAE27733-405 serial: BTSG93800137

           BIOS: Intel v: SGP4510H.86A.0118.2009.0622.1056 date: 06/22/2009

CPU:       Quad core Intel Core2 Extreme X9650 (-MCP-) cache: 6144 KB

           clock speeds: max: 3330 MHz 1: 3330 MHz 2: 3330 MHz 3: 3330 MHz 4: 3330 MHz
```

That system is quite old and rather ever updated (I use Gentoo on this machine only as "experimentation computer" every now and then), still I cannot get /dev/mei* from the kernel module. I'll try the 4.X branch for the sake of science lol. :)

----------

## Kyeno

...and I can pretty much confirm the DP35DP machine hangs on extensive harddrive usage. Just hanged on `perl-cleaner all`

----------

## Kyeno

Hello again.

Just posting here to let you know that I think I managed to regain the system stability and thus this issue might be marked as resolved (even though I still have some warnings in dmesg).

```
# uptime

 12:20:50 up 7 days, 13:30,  9 users,  load average: 0.84, 0.45, 0.18
```

What exactly helped - I don't really know. I can just say few things I did in general to get to this point:

1. Swapped RAM memory entirely, it's running 2 identical 2GB Kingston KVR800D2N6K2/4G 800MHz CL6 Dual-Channel DIMMs marked for "dual channel" running entirely stock (XMP?) settings; thus stripped the "Dual-Channel Asymmetric" config.

```
# modprobe eeprom

# decode-dimms

...cut...

Decoding EEPROM: /sys/bus/i2c/drivers/eeprom/0-0050

Guessing DIMM is in                             bank 1

---=== SPD EEPROM Information ===---

EEPROM Checksum of bytes 0-62                   OK (0x7D)

# of bytes written to SDRAM EEPROM              128

Total number of bytes in EEPROM                 256

Fundamental Memory type                         DDR2 SDRAM

SPD Revision                                    1.2

---=== Memory Characteristics ===---

Maximum module speed                            800 MHz (PC2-6400)

Size                                            2048 MB

Banks x Rows x Columns x Bits                   8 x 14 x 10 x 64

Ranks                                           2

SDRAM Device Width                              8 bits

Module Height                                   < 25.4 mm

Module Type                                     UDIMM (133.25 mm)

DRAM Package                                    Planar

Voltage Interface Level                         SSTL 1.8V

Module Configuration Type                       No Parity

Refresh Rate                                    Reduced (7.8 us) - Self Refresh

Supported Burst Lengths                         4, 8

tCL-tRCD-tRP-tRAS                               6-6-6-18

Supported CAS Latencies (tCL)                   6T, 5T, 4T

Minimum Cycle Time                              2.50 ns at CAS 6 (tCK min)

                                                3.00 ns at CAS 5

                                                3.75 ns at CAS 4

Maximum Access Time                             0.40 ns at CAS 6 (tAC)

                                                0.45 ns at CAS 5

                                                0.50 ns at CAS 4

Maximum Cycle Time (tCK max)                    8.00 ns

---=== Timing Parameters ===---

Address/Command Setup Time Before Clock (tIS)   0.17 ns

Address/Command Hold Time After Clock (tIH)     0.25 ns

Data Input Setup Time Before Strobe (tDS)       0.05 ns

Data Input Hold Time After Strobe (tDH)         0.12 ns

Minimum Row Precharge Delay (tRP)               15.00 ns

Minimum Row Active to Row Active Delay (tRRD)   7.50 ns

Minimum RAS# to CAS# Delay (tRCD)               15.00 ns

Minimum RAS# Pulse Width (tRAS)                 45.00 ns

Write Recovery Time (tWR)                       15.00 ns

Minimum Write to Read CMD Delay (tWTR)          7.50 ns

Minimum Read to Pre-charge CMD Delay (tRTP)     7.50 ns

Minimum Active to Auto-refresh Delay (tRC)      60.00 ns

Minimum Recovery Delay (tRFC)                   127.50 ns

Maximum DQS to DQ Skew (tDQSQ)                  0.20 ns

Maximum Read Data Hold Skew (tQHS)              0.30 ns

---=== Manufacturing Information ===---

Manufacturer                                    Kingston

Manufacturing Location Code                     0x06

Part Number

Manufacturing Date                              2009-W08

Assembly Serial Number                          0x76CC72A8

...cut... (and one more identical)
```

2. Changed graphics card to low-end, low-power PCI-Express based one so I could rid of potentially "too old for this system" S3 ViRGE PCI, though that was more of a lucky guess. Riding passively cooled GeForce 7300GS now which is said to draw only around ~15W power at 2D/console (http://www.tomshardware.com/reviews/geforce-radeon-power,2122-6.html). Probably still more than ViRGE, but again - was my lucky guess. At least BIOS bootup looks sane now since rendering that fancy fullscreen Intel logo on ViRGE didn't quite hit the mark. ;)

```
01:00.0 VGA compatible controller: NVIDIA Corporation G72 [GeForce 7300 GS] (rev a1)

   Subsystem: Gigabyte Technology Co., Ltd G72 [GeForce 7300 GS]
```

3. Disabled framebuffer entirely. No drivers, no support, nothing. No DRM drivers either, though DRM support alone I left available in case I wanted to run X with NVidia properitary driver for lols once or twice (noveau does not seem to support this GPU at all; played a bit around it before).

4. Disabled HPET (High-Precision-Event-Timer) in BIOS and in kernel + compiled-in almost every possible chipset and motherboard driver (including smbus, and lpc) to make sure the clocksource can be properly read from TSC/PIT[/code]:

```
[    0.000000] tsc: Fast TSC calibration using PIT

[    0.000000] tsc: Detected 2333.169 MHz processor

[    0.008002] Calibrating delay loop (skipped), value calculated using timer frequency.. 4666.33 BogoMIPS (lpj=9332676)
```

...and it seems to work, survive both artificial and real-world stress tests, and stay alive for a week now without hangups or reboots. Voltages and precise temps I'm still getting via MEI; I gave up trying on precise hardware chip drivers (as mentioned few posts above).

Strangely I still do get really weird stuff in dmesg (below) but this does not seem to affect stability anymore.

```
[    0.012000] Initializing CPU#1

[    0.142020]  #2

[    0.012000] Initializing CPU#2

[    0.012000] calibrate_delay_direct() dropping max bogoMips estimate 3 = 12028618

[    0.012000]  #3

[    0.012000] Initializing CPU#3

...cut...
```

```
[    0.332025] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])

[    0.332169] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments]

[    0.332313] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM

[    0.332455] acpi PNP0A03:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-7f] only partially covers this bridge

[    0.333584] PCI host bridge to bus 0000:00

...cut...
```

```
[    0.421152] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x000000000000050C-0x000000000000050F (\IGPO) (20160422/utaddress-255)

[    0.421577] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

[    0.422056] gpio_ich: GPIO from 451 to 511 on gpio_ich

...cut...

```

The above one *really* puzzles me, but I cannot seem to find anything about it anywhere.

Then there are few interesting dmesg messages generated over certain time while the machine is running:

```
[ 2027.103049] mei_me 0000:00:03.0: less data available than length=00000001.

[ 4013.470666] mei_me 0000:00:03.0: less data available than length=00000001.

[83102.777465] perf: interrupt took too long (2517 > 2500), lowering kernel.perf_event_max_sample_rate to 79250

[140107.402756] perf: interrupt took too long (3273 > 3146), lowering kernel.perf_event_max_sample_rate to 61000

...cut...

[269567.130212] perf: interrupt took too long (4095 > 4091), lowering kernel.perf_event_max_sample_rate to 48750

[329062.583526] perf: interrupt took too long (5123 > 5118), lowering kernel.perf_event_max_sample_rate to 39000

[333504.153031] mei_me 0000:00:03.0: less data available than length=00000000.

[337379.148027] mei_me 0000:00:03.0: less data available than length=00000000.

[337596.704447] mei_me 0000:00:03.0: less data available than length=00000000.
```

But again, that does not seem to mess around stability. Still if you guys have any hints on those (before closing this topic) - would be appreciated.

Also, I didn't really have time to experiment with MEI on DP45SG nor DP55KG since they require *massive* system updates for that (very old Gentoo, ~2 years without updates. Since that time the profiles has changed, baselayout has changed, portage has changed, updating world throws 1k collisions, etc. Would need a day or two to solve it and will do, but a bit later...)

Thanks a lot for all of your support here!

Keep up the good work, Gentoo!

----------

## steveL

 *Kyeno wrote:*   

> 
> 
> ```
> [    0.421152] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x000000000000050C-0x000000000000050F (\IGPO) (20160422/utaddress-255)
> 
> ...

 

The OpRegion part is ACPI_I2C_OPREGION: 

```
$ grep -F REGION /usr/src/linux/.config

# CONFIG_PMIC_OPREGION is not set

# CONFIG_ACPI_I2C_OPREGION is not set
```

 AFAIR, gpio_ich is the backend equivalent of i2c-algobit (bit-banging; expose ports and allow us to drive them how we choose.)

I'd say it's working correctly, in that it's selecting which pins are actually available, and don't conflict.

I didn't set ACPI_I2C_OPREGION, as it mentioned running BIOS/firmware code, and I don't like the sound of that.

If it turns out to be needed, I will (or kconfig will do it for me.)

----------

