# nvidia MCP78S + seagate 1.5TB drive = sata errors

## slugbait

I'm building a new desktop and I've run into a problem.  I've installed on a 320GB Western Digital drive with no problems:

beast ~ # uname -a

Linux beast 2.6.27-gentoo-r8 #1 SMP PREEMPT Thu Feb 12 12:23:10 EST 2009 x86_64 AMD Phenom(tm) II X4 940 Processor AuthenticAMD GNU/Linux

Now I'm trying to get the 1.5TB Seagate drive configured and mounted, but mkfs.ext3 produces the following in /var/log/messages:

 *Quote:*   

> 
> 
> Feb 12 15:13:07 beast ata3: hard resetting link
> 
> Feb 12 15:13:08 beast ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ...

 

I've updated the BIOS and enabled AHCI.... same errors.  I have AHCI enabled in the kernel:

 *Quote:*   

> 
> 
> beast ~ # zgrep AHCI /proc/config.gz 
> 
> CONFIG_SATA_AHCI=y
> ...

 

I've spent the last few hours trying to find a solution, but I've obviously failed.  Can anyone help?

 *Quote:*   

> 
> 
> beast ~ # hdparm -I /dev/sdb
> 
> /dev/sdb:
> ...

 

 *Quote:*   

> 
> 
> beast ~ # lspci
> 
> 00:00.0 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a2)
> ...

 

Kernel config is at http://www.severus.org/kernel-config

----------

## pappy_mcfae

Your .config is a mess, slugbait. Please post the results of lspci -n and cat /proc/cpuinfo as well as your /etc/fstab file, and I'll get you working right.

Blessed be!

Pappy

----------

## slugbait

The config is just the default from genkernel with most of the hardware I don't have disabled.  I haven't gone the opposite way and built a completely stripped kernel yet, but I will certainly do so if necessary...

 *Quote:*   

> 
> 
> beast ~ # lspci -n
> 
> 00:00.0 0500: 10de:0754 (rev a2)
> ...

 

 *Quote:*   

> 
> 
> beast ~ # cat /proc/cpuinfo
> 
> processor       : 0
> ...

 

I'm not sure why this matters, but:

 *Quote:*   

> 
> 
> /dev/sda1               /boot           ext2            noauto,noatime  1 2
> 
> /dev/sda3               /               ext3            noatime         0 1
> ...

 

/dev/sdb1 will eventually be mounted at /home when everything is working.

----------

## pappy_mcfae

Your /etc/fstab matters because the kernel must know which file systems you are using. Without that info, welcome to Kernel-panic-opolis. 

As per your original setup, having the SATA drivers and the ATA/ATAPI/MFM/RLL drivers active is a recipe for conflicts and all manner of disaster. This may be a reason for your drive issues. 

It is also possible that you have a bad drive. Yes, drives can be bad fresh out of the box. It doesn't take much shock to crush the read heads...spin up once, and you've just cut all kinds of grooves into the surface of the media. It's not pretty. If this kernel doesn't bring said drive around, take it back.

Click here for your new .config. Compile as is.

For the best results, please do the following:

1) Move your .config file out of your kernel source directory ( 2.6.27-gentoo-r8 ).

2) Issue the command make mrproper. This is a destructive step. It returns the source to pristine condition. Unmoved .config files will be deleted!

3) Copy my .config into your source directory.

4) Issue the command make && make modules_install.

5) Install the kernel as you normally would, and reboot.

6) Once it boots, please post /var/log/dmesg so I can see how things loaded.

As I said, this may or may not snap the drive into shape. What it will do is once you get that drive issue fixed, is give you a fast running Linux kernel. That's always a good thing.

Blessed be!

Pappy

----------

## slugbait

No dice.  

http://www.severus.org/beast-dmesg

I even tried applying a jumper to lock the drive to 1.5GB/s with no change.

----------

## pappy_mcfae

Good, then we have proved the 1.5TB drive to be bad, or its cables. Check to make sure the power and interface cables are all tight. If they are, take the drive back and get a new one, then retry. 

Blessed be!

Pappy

----------

## naelq

slugbait, could you please try the HDD with other distro, say ubuntu live-cd & report back? (just to make sure that it's ain't any missing option/configuration)

nael

----------

## pappy_mcfae

It is NOT missing anything with a Pappy seed.

Blessed be!

Pappy

----------

## naelq

no offense mate, but i would NOT rush with

 *Quote:*   

> Good, then we have proved the 1.5TB drive to be bad

 

anyway, as you say!!

nael

----------

## pappy_mcfae

After dealing with many peoples' dmegs, I'm pretty sure I know what a bad drive looks like. 

And no offense taken, but when I set out to help someone, I am thorough, if nothing else.

Blessed be!

Pappy

----------

## darklegion

Some of Seagate's recent drives have been having serious firmware issues.I know it occurred with larger drives, so it's possible that yours is affected.

----------

## Monkeh

 *pappy_mcfae wrote:*   

> And no offense taken, but when I set out to help someone, I am thorough, if nothing else.

 

Well you've missed the important stuff.

Install smartmontools and post the output of smartctl -ia /dev/sdb.

----------

## pappy_mcfae

Those tools would only tell me what /var/log/dmesg already did. The only other test I would do is to listen to the drive as it's running. If it ticks, voila, you have the proof. If it's quiet, as in no motor running, you have even better proof.

Blessed be!

Pappy

----------

## Monkeh

 *pappy_mcfae wrote:*   

> Those tools would only tell me what /var/log/dmesg already did.

 

No, they'll tell you quite a lot more..

 *Quote:*   

> The only other test I would do is to listen to the drive as it's running. If it ticks, voila, you have the proof.

 

Of what? A head unload? A brief seek? There is only one useful diagnostic tool: SMART.

----------

## rapsure

Check your hard drive model and firmware and search seagate's tech support. The was a period where the 1.5TB drives were known to do just what you are experiencing.

----------

## slugbait

I booted from an ubuntu livecd yesterday and tried to run mkfs.ext3 and got the same slew of errors.  It must be a bad drive, right?  I bought another drive today and installed/formatted it with no problems:

beast ~ # df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda3             286G  5.4G  266G   2% /

udev                   10M   88K   10M   1% /dev

shm                   3.9G     0  3.9G   0% /dev/shm

/dev/sdb1             1.4T   99G  1.2T   8% /home

I started the transfer of my home directory on my current desktop (367GB worth of various file sizes) a few hours ago and went upstairs to watch some DVDs.  I just came back down to find everything running smoothly on the surface, but /var/log/messages says otherwise:

 *Quote:*   

> 
> 
> Feb 14 23:05:58 beast [ 6969.924777] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
> 
> Feb 14 23:05:58 beast [ 6969.924786] ata2.00: irq_stat 0x08000000, interface fatal error
> ...

 

 *Quote:*   

> 
> 
> Feb 14 23:29:34 beast [ 8386.201314] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
> 
> Feb 14 23:29:34 beast [ 8386.201323] ata2.00: irq_stat 0x08000000, interface fatal error
> ...

 

So... we're back where we started, just not as often.  I started the dump at about 22:30 and it's still going as of 0140 with only these two burps.  Here is the output from smartctl after copying 108GB and the two errors reported above:

 *Quote:*   

> 
> 
> beast ~ # smartctl -ia /dev/sdb
> 
> smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> ...

 

I did my homework before buying this motherboard to make sure the chipset was supported, but I didn't dig as deeply into the recent history of Seagate's big drives.  I'd read about the dead drive issues, but all of the articles I read indicated that the problem was resolved in the firmware updates.  I still have the original drive, so I plan to install it in my current desktop tomorrow to try to eliminate one more variable.  I have a WindowsXP partition on this system, so I'll even be able to try that...

----------

## slugbait

Heh... I just noticed something:

 *Quote:*   

> 
> 
> Device Model:     ST31500341AS
> 
> Serial Number:    9VS04Q6Q
> ...

 

from the dmesg:

 *Quote:*   

> 
> 
> [    1.733009] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 
> [    1.734426] ata2.00: ATA-8: ST31500341AS, SD19, max UDMA/133
> ...

 

[url]

http://ata.wiki.kernel.org/index.php/Known_issues#Seagate_harddrives_which_time_out_FLUSH_CACHE_when_NCQ_is_being_used

[/url]

"problem happens most frequently on this model" doesn't sound so good...

----------

## naelq

at least the drive is NOT faulty  :Wink:  good luck!

nael

----------

## Monkeh

Well the drive isn't physically faulty. It could be bad firmware, bad PHY, or just a bad cable or connection.

Try using a different SATA cable on it, and make sure it's secure. If it still fails, get it replaced.

----------

## slugbait

This is becoming absurd.  I just tried to use Seagate's iso (Brinks-4D8H-SD1B.ISO) to update the firmware on the second drive from SD19 to SD1B.  It boots and loads the detection program but it doesn't see the drive.  Apparently seagate has decided that "only a small batch of drives" were b0rked, so the firmware update program will only detect and upgrade drives with certain serial numbers.  Meanwhile, libata disables NCQ as soon as it sees the SD19 firmware.

Seagate is now officially on my vendor shit-list.  I've wasted far too much time trying to fix this problem.

----------

## slugbait

"Check the cable" is one of the first things I did, by the way...  I've used 4 different cables on 3 different SATA ports on the motherboard.

----------

## Drone1

slugbait

We have 4 of ST31500341NS (NS i thnk; they were listed as NOT having the issue but we were seeing horrible performance issues and strange RAID behavior) drives at work and only 3 would upgrade the firmware to the SD1B version on the intended system. Tried updating the 4th drive on a completely different system, and it updated WITHOUT issue. Intended system is now running as we have planned....

If you have another system with SATA, try to update the HD firmware using that system. There is more going on with the seagate drives than what is on the forums and on the tech sites. It updates on one system but not another? And yes, I'm dealing with different arch's/proc's/chipset brands between those 2 systems to get the HD's firmware updated.

----------

## Monkeh

Try using legacy IDE mode for the firmware update instead of AHCI mode, if it supports it.

----------

