# SATA unstable

## Geizeskrank

Hello,

I`ve two SATA hard disk`s they hang up when the I/O is higher for a longer time.

This symptom is unregular, sometimes I can write 80GB at once and another time it hang up when I will copy 30GB.

If I look in "top" I see in the moment of the fault then is "wa" over 90%.

The hard disk can not be adressed, so I put the power off and reboot and the HD`s runs normaly.

So, how I can test where the fault is?

I got the error with the sata_sil and the siimage module.

The controller is a sil3112 PC-Card (PCMCIA).

System: Powerbook G4

----------

## audiodef

Sysresccd might have some useful utils for this.

----------

## Geizeskrank

Hi,

which one do you mean.

Its is not so, that I can`t work with the system, my root partition is on /dev/hda and the

Hard Disks which I mean are /dev/sda and /dev/sdb.

----------

## NeddySeagoon

Geizeskrank,

 *Geizeskrank wrote:*   

> my root partition is on /dev/hda

  Ouch!

Several years ago there was a problem with some SATA drivers and very high kernel waits. That your system is on /dev/hda shows your kernel (and maybe more) is several years out of date. If this is true, you have a known issue and should update your system, if you still can.

What does 

```
emerge --info
```

 show

----------

## Geizeskrank

Hi,

that my root is on /dev/hda is ok, it is not more than an old IDE device.

```
ppc / # emerge --info

Portage 2.1.10.11 (default/linux/powerpc/ppc32/10.0/server, gcc-4.5.3, glibc-2.12.2-r0, 2.6.34-gentoo-r12-ppc32              ppc)

=================================================================

System uname: Linux-2.6.34-gentoo-r12-ppc32-ppc-7455,_altivec_supported-with-gentoo-2.0.3

Timestamp of tree: Fri, 18 Nov 2011 18:45:01 +0000

app-shells/bash:          4.1_p9

dev-lang/python:          2.7.1-r1, 3.1.3-r1

sys-apps/baselayout:      2.0.3

sys-apps/openrc:          0.8.3-r1

sys-apps/sandbox:         2.4

sys-devel/binutils:       2.20.1-r1

sys-devel/gcc:            4.5.3-r1

sys-devel/gcc-config:     1.4.1-r1

sys-devel/make:           3.82-r1

sys-kernel/linux-headers: 2.6.36.1 (virtual/os-headers)

sys-libs/glibc:           2.12.2

Repositories: gentoo

ACCEPT_KEYWORDS="ppc"

ACCEPT_LICENSE="* -@EULA"

CBUILD="powerpc-unknown-linux-gnu"

CFLAGS="-mcpu=7450 -O2 -pipe -maltivec -mabi=altivec -fno-strict-aliasing"

CHOST="powerpc-unknown-linux-gnu"

CONFIG_PROTECT="/etc"

CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/ter             minfo"

CXXFLAGS="-mcpu=7450 -O2 -pipe -maltivec -mabi=altivec -fno-strict-aliasing"

DISTDIR="/usr/portage/distfiles"

FEATURES="assume-digests binpkg-logs distlocks ebuild-locks fixlafiles fixpackages news parallel-fetch protect-o             wned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"

FFLAGS=""

GENTOO_MIRRORS="ftp://de-mirror.org/gentoo/"

LDFLAGS="-Wl,-O1 -Wl,--as-needed"

PKGDIR="/usr/portage/packages"

PORTAGE_CONFIGROOT="/"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --             stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

PORTDIR_OVERLAY=""

SYNC="rsync://rsync1.de.gentoo.org/gentoo-portage"

USE="acl berkdb bzip2 cli cracklib crypt cups cxx dri fortran gdbm gpm iconv ipv6 modules mudflap ncurses nls np             tl nptlonly openmp pam pcre ppc pppd readline session snmp ssl sysfs tcpd truetype unicode xml xorg zlib" ALSA_C             ARDS="aoa aoa-fabric-layout aoa-onyx aoa-soundbus aoa-soundbus-i2s aoa-tas aoa-toonie powermac usb-audio via82xx             " ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat              linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_             basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_ho             st authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_fi             lter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite seten             vif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan stage tables              krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIB             C="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntr             ip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DE             VICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxor             b ncurses text" PHP_TARGETS="php5-3" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="fbdev glint mach64 mga nv              r128 radeon savage tdfx trident dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ip             p2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"

Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_BUNZIP2_C             OMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

```

```
/dev/hda2: LABEL="bootstrap" TYPE="hfs"

/dev/hda3: UUID="2257b1c6-2d61-42ee-bfc1-72bdd1cb73c7" TYPE="ext4"

/dev/sda1: UUID="410a1a29-b9ce-4f3e-9511-19e2bc312fc9" TYPE="ext4"

/dev/sdb1: UUID="b6f02ba5-abf0-439a-b9bd-ff67e3c363b2" TYPE="ext4"

```

```
lspci

0000:00:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 AGP

0000:00:10.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M7 LW [Radeon Mobility 7500]

0001:10:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 PCI

0001:10:17.0 Unassigned class [ff00]: Apple Computer Inc. KeyLargo Mac I/O (rev 03)

0001:10:18.0 USB Controller: Apple Computer Inc. KeyLargo USB

0001:10:19.0 USB Controller: Apple Computer Inc. KeyLargo USB

0001:10:1a.0 CardBus bridge: Texas Instruments PCI1410 PC card Cardbus Controller (rev 02)

0001:11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)

0002:24:0b.0 Host bridge: Apple Computer Inc. UniNorth 1.5 Internal PCI

0002:24:0e.0 FireWire (IEEE 1394): Agere Systems FW322/323

0002:24:0f.0 Ethernet controller: Apple Computer Inc. UniNorth GMAC (Sun GEM) (rev 01)

```

----------

## NeddySeagoon

Geizeskrank,

Hmm ... default/linux/powerpc/ppc32/10.0/server ... I wasn't expecting PPC.

Kernel 2.6.34 is still old. Try an update.

I used a SIL3112 chip in mdadm raid0 on may main x86 32 bit box for may years with no problems.  I still use one in my SPARC U10 server with no issues but there is no raid there.

----------

## krinn

Seen a similar issue with a sata green hdd, the hdd park its head after few seconds, when copying your root fill its buffer until it need to dump the data to the "sleeping" disk (the kernel itself isn't aware the disk is off as the disk decide to do that by itself without telling anyone), and the disks is slow as hell to wake up and kernel get stuck waiting for it to answer.

Could be that if your disk is one too. And i solve it by removing the disk and invoking God to gave them pain to these fucking green hdd (but i refuse to endorse raining in tailand)

----------

## Geizeskrank

NeddySeagoon,

current I compile the kernel 2.6.39 <- latest for ppc 

krinn,

I hope that you not mean this one:

http://www.moraviapc.cz/fotky11146/fotos/_vyr_380WD10EARS.jpg

I known your description from Debian Linux..

When I have copy a bigger file the HDD stucks every 30sek

Because of this and the performance problem I`ve changed to Gentoo 

Debian (ca. 20 - 30MB/s) Gentoo (ca. 60 - 70MB/s)

----------

## NeddySeagoon

Geizeskrank,

I have five of those, well its 2Tb big brother, in RAID5 in my media server with no ill effects.

I have heard horror stories like Krinn recounts about these drives both stand alone and in raid.

Check your drives firmware version and check for firmware updates on the WD site.

Applying firmware updates is not risk free so make sure the changes will be worth having before you try any update.

How did you partition your drives ?

These drives have always been 4kb phsical sectors but they lie about it and report 512b to the operating system.

IF you do not create your partitions on 4kb boundaries, the drive is very slow as it must execute read modify writes to carryout out unalighed writes.

Please post your partition tables. The fix is destructive of your data.  Its backup, repartition and restore.

----------

## krinn

mine is a WD5000AADS-00L4B1 it looks similar but a smaller version, but yep green and WD too

----------

## Geizeskrank

Hi,

I`ve 2 1TB partitions begins at 2M and they both alignment  :Wink: 

If I put them in a Raid 0 or 1 then they write very slow with 2 -5 MB/s. 

It seems that I`ve got forget an option in the kernel my boot stop with "returning from prom_init"

I`ll fix that today and look tomorow again for my HD`s.

----------

## Geizeskrank

Hi,

encolsed some facts:

```
fdisk /dev/sda

Command (m for help): p

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes

255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x45390500

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1            4096  1953523711   976759808   83  Linux

Command (m for help):

```

```
fdisk /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Command (m for help): p

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes

255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xf94f0100

   Device Boot      Start         End      Blocks   Id  System

/dev/sdb1            4096  1953523711   976759808   83  Linux

Command (m for help):

```

----------

## Geizeskrank

Hello,

today I have it again:

[IMG]http://img828.imageshack.us/img828/8400/screenshotnf.jpg[/IMG]

I copied a .VOB via Wlan in my home folder, than I wanna remap the audio and save the file on my video folder.

The result is that my hard disk is so long unreachable as far as I delete the new created file.  :Sad: 

```
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata2.00: BMDMA2 stat 0xc0009

ata2.00: failed command: READ DMA

ata2.00: cmd c8/00:08:a8:55:04/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:a8:55:04/00:00:00:00:00/e0 Emask 0x9 (media error)

ata2.00: status: { DRDY ERR }

ata2.00: error: { UNC }

ata2.00: configured for UDMA/100

sd 1:0:0:0: [sdb] Unhandled sense code

sd 1:0:0:0: [sdb]  Result: hostbyte=0x00 driverbyte=0x08

sd 1:0:0:0: [sdb]  Sense Key : 0x3 [current] [descriptor]

Descriptor sense data with sense descriptors (in hex):

        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

        00 04 55 a8

sd 1:0:0:0: [sdb]  ASC=0x11 ASCQ=0x4

sd 1:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 00 04 55 a8 00 00 08 00

end_request: I/O error, dev sdb, sector 284072

ata2: EH complete

```

----------

## NeddySeagoon

Geizeskrank,

It looks like the disk has gone off line and the kernel is waiting forever. Thats not like the kernel wait bug, more like hardware.

Maybe a poor quality SATA data cable, or the data cable bit connected properly. Try a new/different data cable.

If that doesn't fix it, try a disk drive  firmware upgrade.

----------

## Geizeskrank

Hello,

I think I`ll try a new SATA Cable.

I´m wondering cause it´s so regular.

 *Quote:*   

> It looks like the disk has gone off line and the kernel is waiting forever.

 

It was so, that I´ve already copy some MB on this disk and then goes the HD stuck (has you also mean that?), so long as far as I delete the new file.

So, in this time I can (very very slow) access the folder to remove it.

----------

## NeddySeagoon

Geizeskrank

```
ata2.00: status: { DRDY ERR } 
```

 If full that means Drive Ready Error.  That should never happen.

The drive becomes ready when the platters spinup and the heads unpark.  It stays ready until shutdown.

This indicates a hardware issue of some sort.  Investigation is really only possible by replacement, so start with the low cost item.

When this error happend in normal use, the kernel can no longer talk to the hard drive.

Does the drive make any clciking noises when this happens ?

----------

## energyman76b

 *NeddySeagoon wrote:*   

> Geizeskrank
> 
> ```
> ata2.00: status: { DRDY ERR } 
> ```
> ...

 

no that means: Drive Ready, there was an error.

or to use more words: there was an error. Error handling was started, drive is ready now to accept more commands, following informations are for debuggin purposes.

The error is this:

         res 51/40:00:a8:55:04/00:00:00:00:00/e0 Emask 0x9 (media error) 

and

sd 1:0:0:0: [sdb] Unhandled sense code 

and this

end_request: I/O error, dev sdb, sector 284072 

drive has defective sector. Discard and get new one.

----------

## Geizeskrank

Hello,

I have new cables but no new news:

```
ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80d3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:00:98:2e:15/00:00:00:00:00/e0 tag 0 dma 131072 in

         res 51/40:8f:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80d3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:00:98:2e:15/00:00:00:00:00/e0 tag 0 dma 131072 in

         res 51/40:8f:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

sd 0:0:0:0: [sda] Unhandled sense code

sd 0:0:0:0: [sda]  Result: hostbyte=0x00 driverbyte=0x08

sd 0:0:0:0: [sda]  Sense Key : 0x3 [current] [descriptor]

Descriptor sense data with sense descriptors (in hex):

        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

        00 15 2f 02

sd 0:0:0:0: [sda]  ASC=0x11 ASCQ=0x4

sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 15 2e 98 00 01 00 00

end_request: I/O error, dev sda, sector 1388290

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80c3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:08:00:2f:15/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80c3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:08:00:2f:15/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80c3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:08:00:2f:15/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80c3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:08:00:2f:15/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80c3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:08:00:2f:15/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

ata1: EH complete

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata1.00: BMDMA2 stat 0x80c3109

ata1.00: failed command: READ DMA

ata1.00: cmd c8/00:08:00:2f:15/00:00:00:00:00/e0 tag 0 dma 4096 in

         res 51/40:00:02:2f:15/00:00:48:00:00/e0 Emask 0x9 (media error)

ata1.00: status: { DRDY ERR }

ata1.00: error: { UNC }

ata1.00: configured for UDMA/100

sd 0:0:0:0: [sda] Unhandled sense code

sd 0:0:0:0: [sda]  Result: hostbyte=0x00 driverbyte=0x08

sd 0:0:0:0: [sda]  Sense Key : 0x3 [current] [descriptor]

Descriptor sense data with sense descriptors (in hex):

        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

        00 15 2f 02

sd 0:0:0:0: [sda]  ASC=0x11 ASCQ=0x4

sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 15 2f 00 00 00 08 00

end_request: I/O error, dev sda, sector 1388290

ata1: EH complete

EXT4-fs error (device sda1): ext4_lookup:1044: inode #581640: comm smbd: deleted                                              inode referenced: 581916

```

----------

## NeddySeagoon

Geizeskrank,

Then as energyman76b says, its your drive.

Its not quite as black and white as energyman76b suggests.  Drives actually have sectors fail throughout their useful life.

Normally, they predict when a sector is about to fail and move the data to a spare sector before the failure actually happens, so most users don't notice this process.

Just occasionally, the drive gets it wrong and the sector fails before the data is moved. This may have happened to you.

The data at the failed sector is lost but you can still force the drive to remap the sector by performing a write to the damaged sector.  The write will fail and provided the drive still has spares, the sector will be remapped. When the drive can no longer hide failed sectors, its end of life - scrap it.

The question is, were you unlucky and the sector failure prediction let you down, or is this incident suggesting your drive is about to have a lot more failed sectors.

If you were unlucky, the write to force  a remap will give you some more useful life out of the drive, if not, you will wish you had replaced it now.

----------

## Geizeskrank

Hello,

a failure on both new devices??

I will try my old seagate later.

----------

## energyman76b

rule of thumb (aka Google's law) when a drive starts showing bad blocks, it is going to fail soon.

and yes, two new drives showing bad blocks can happen. You weren't careful enough carrying them home.

----------

## Geizeskrank

Hello,

today I get a new old HD.

So, my %wa is going down to 30%   :Confused: 

No Stuck...

----------

