# SATA drive - is it toast?

## rgh

This may not be a Gentoo specific question but Gentoo did save me from losing a lot of data.

I have a SATA drive that is currently formatted as NTFS (my better half forces to me to be able to dual boot into Windows XP) which is accessible to Gentoo. However, under windows, having that drive enabled now (it worked perfectly for 8 months) causes windows to freeze for several minutes at a time over and over and over again. Under Gentoo, I can read it without any problems whatsoever. Under both systems, I can not write anything to the drive nor can I even delete the partition or re-format it. If I do get fdisk (under gentoo or windows) to delete the patition or if I am able to get it re-formatted, upon reboot the old ntfs partition complete with all of the old data re-appears. Nothing I can do will allow me to re-format this drive. The only error message that I can ever see is something about a $LogFile (I can't remember the wording and I would have to re-boot into windows to re-create it).

Based upon this limited information, is my drive dead? It's only an 8 month old drive.

----------

## vaxbrat

Can you get windows up far enough to pull up the properties for the drive and force it to do a scandisk on next reboot?

I'm a bit amazed that Linux fdisk didn't re-partition.  Are you sure you deleted the partition, created a new one and then confirmed that the changes got written out?  

When a drive goes, it either goes all at once, or it starts thrashing and finding bad blocks before going totally.  I can't recall having one go readonly on me.

----------

## gsoe

I don't think you should give up on the drive. Windows can do a lot of funny things, but usually one can get around it. A couple of days ago I was confronted with a system, that gave me the blue screen of death in the middle of the windows boot sequence reporting some ntfs error. Even when I booted the windows install CD this was the result, so basically the system was completely unusable and un-reinstallable from a windows point of wiew.

I booted the Gentoo minimal CD, started sshd, mounted all partitions (without any problems whatsoever) and copied off all data to another machine. Then I fdisked the partitions away, made a new ext2 and formatted it. Still no booting from the windows CD. Then again with the Gentoo CD I mounted the disk and did

```
dd if=/dev/urandom > somefilename
```

and let that run until the disc was full. Then I deleted the partition and voila: the windows install CD worked again, and I now have a fully working system installed.

EDIT: If you can't actually fdisk the drive (when doing it from a bootable CD system), try copying over partitions from another drive using dd or even overwriting the raw disc with zeroes (careful, investigate first so you don't trash something). Last resort is the manufacturers low-level format utility, that can usually be downloaded.

----------

## rgh

waxbrat: I'm sure that I deleted the partition. fdisk did its thing, I exited, the drive was no longer visible, re-entered fdisk and created an ext3 partition, exited, tried to format the drive (mke2fs stopped halfway through, no errors, and freezes), exited and then rebooted.....only to find that the old ntfs partition is still there. Just tried again after posting my original message with the same result.

gsoe: Thats why I am glad I had Gentoo installed. It has no issues reading from this drive and I was able to get off of it, a lot of my data. However, it is ntfs and I can not write to it under Gentoo....actually, as I said, I can't do anything but read from it. I have yet to try fdisk from a bootable CD system....let me get back to you.

----------

## rgh

OK....just tried from my Gentoo livecd. fdisk shows that it deleted the partition, I write the changes and exit and all appears ok. If I go back into fdisk, the ntfs partition is still there. Weird. It's like I will be looking at the same 110 Gb of data for ever.

----------

## gsoe

That is really weird. Seems to be a problem for the hardcore specialists. Anyway, a few things you could try: If you do (say the partition in question is /dev/sdxx, and it shouldn't be mounted)

```
dd if=/dev/sdxx of=/dev/sdxx
```

it will "refresh" all data on the partition. It just reads every block on the partition and writes it back again. It is said to help, if the problem is some kind of degenerating magnetic structure on the disc, and as it writes "bitwise" on the raw media, it doesn't matter that linux can't write ntfs. Other thing: Did you try making multiple new partitions in place of the old, so as to avoid getting the exact same beginning and end? Finally, what happens if you don't delete the partition, but just toggle the partition type to something else?

Good luck! Looking forward to learn what happens.

----------

## rgh

First, thanks for your time.

I tried changed partition types....gives me an error.

I tried deleting the partition then creating two new smaller partitions. It did but with an error but still fdisk showed the new partitions. I then formated which completed with an error and during the format, it stalled like it always does at inode 179. Afterwards I could still see the 'old' directory information. The it would not let me mount the drive (my mistake more than likely as I never changed the fstab entry prior to trying) and it would not let me fdisk it again.

I did a soft reboot during the drive was not recognized. Since this mess started, this has happened on nearly every soft reboot.

I did a hard reboot, the drive was recognized but lo and behold, still had the old partition and the old data still in tact.

I have attached a copy of the commands isssued and there results below:

I will try the dd trick later as I only have a few minutes on my lunch. Thanks again.

localhost ~ # fdisk /dev/sda

The number of cylinders for this disk is set to 30401.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1       30401   244196001    7  HPFS/NTFS

Command (m for help): t

Selected partition 1

Hex code (type L to list codes): 83

Changed system type of partition 1 to 83 (Linux)

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

localhost ~ # fdisk /dev/sda

The number of cylinders for this disk is set to 30401.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1       30401   244196001    7  HPFS/NTFS

Command (m for help): d

Selected partition 1

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

p

Partition number (1-4): 1

First cylinder (1-30401, default 1): 

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-30401, default 30401): +100G

Command (m for help): p

Disk /dev/sda: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1       12159    97667136   83  Linux

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

p

Partition number (1-4): 2

First cylinder (12160-30401, default 12160): 

Using default value 12160

Last cylinder or +size or +sizeM or +sizeK (12160-30401, default 30401): 

Using default value 30401

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

localhost ~ # fdisk /dev/sda

The number of cylinders for this disk is set to 30401.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1       12159    97667136   83  Linux

/dev/sda2           12160       30401   146528865   83  Linux

Command (m for help): q

localhost ~ # mke2fs -j -O dir_index /dev/sda1

mke2fs 1.40-WIP (7-Apr-2007)

/dev/sda1 is mounted; will not make a filesystem here!

l

ocalhost ~ # umount /dev/sda1

localhost ~ # ls /mnt/media

1b198182f6196ea0778f2977   TSV

Backup.bkf                 Temp

Backup2.bkf                X-Plane Installer.prf

Gallery Data               cbeb31df1ec6f1e47c7bf640

MSOCache                   dfb2bc2229b8e2455350de205f07110c

Media                      msdownld.tmp

RECYCLER                   pagefile.sys

System Volume Information

localhost ~ # umount /dev/sda1

localhost ~ # mke2fs -j -O dir_index /dev/sda1

mke2fs 1.40-WIP (7-Apr-2007)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

30539776 inodes, 61049000 blocks

3052450 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=0

1864 block groups

32768 blocks per group, 32768 fragments per group

16384 inodes per group

Superblock backups stored on blocks: 

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 

        4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done                            

ext2fs_mkdir: Attempt to read block from filesystem resulted in short read while creating root dir

localhost ~ # mount /dev/sda1

mount: wrong fs type, bad option, bad superblock on /dev/sda1,

       missing codepage or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so

localhost ~ # ls /mnt/media

localhost ~ # umount /dev/sda1

umount: /dev/sda1: not mounted

localhost ~ # fdisk /dev/sda 

Unable to read /dev/sda

localhost ~ # 

SOFT REBOOT

ata2: SATA max UDMA/100 cmd 0xf88160c0 ctl 0xf88160ca bmdma 0xf8816008 irq 17

scsi0 : sata_sil

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

ata1.00: qc timeout (cmd 0xec)

ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)

ata1: COMRESET failed (device not ready)

ata1: hardreset failed, retrying in 5 secs

ata1: COMRESET failed (device not ready)

ata1: hardreset failed, retrying in 5 secs

ata1: COMRESET failed (device not ready)

ata1: reset failed, giving up

scsi1 : sata_sil

HARD REBOOT

localhost ~ # fdisk /dev/sda

The number of cylinders for this disk is set to 30401.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1       30401   244196001    7  HPFS/NTFS

Command (m for help): q

localhost ~ #

----------

## rgh

OK, did the dd thing:

localhost ~ # dd if=/dev/sda1 of=/dev/sda1

dd: reading `/dev/sda1': Input/output error

183008+0 records in

183008+0 records out

93700096 bytes (94 MB) copied, 168.495 s, 556 kB/s

localhost ~ # fdisk /dev/sda

Unable to read /dev/sda

localhost ~ # 

Had to hard reboot in order to get the drive back. There has got to be some kind of weirdo error happening on this drive but there must be a way around it.

----------

## gsoe

Really weird. I'm at a loss now. Loosing the drive at soft reboots and dd throwing an error? Something seems to be really wrong. Are you sure that your kernel config is ok? In particular do you have 

```
CONFIG_MSDOS_PARTITION=y
```

set? There are some problems in what you do though, this part *Quote:*   

> ocalhost ~ # fdisk /dev/sda
> 
> The number of cylinders for this disk is set to 30401.
> 
> There is nothing wrong with that, but this is larger than 1024,
> ...

 shows that you're deleting and creating partitions with the partition in question mounted. It also shows that you have a fstab line specifying to mount the partition as ntfs. Some of this might have an impact on the results you get. On the other hand you've tried with a Gentoo minimal CD as well, but in any case whatever you try now, do it from a cd to rule out the things mentioned above.

Anyway, the best suggestion now would be to download the manufacturers software for checking and low-level formatting the drive. I've only tried Hitachi's software myself, but I reckon that the other manufacturer have similar utilities. It can do a check of the SMART parameters of the drive and it can zero out all data on it. I think one can do the same things from linux with different utilities, but it's a lot easier to use a single utility.

----------

## gsoe

Addition to the above post: I just reread all of the thread. As it seems you actually succeeded in creating a filesystem somehow, I would try that again and then fill it up. First thing: Comment out any references to the drive in your fstab, so your system won't try to mount any partitions from it when you eventually boots it again. Then boot the minimal CD and fdisk the existing partition away. Create a small, say 80M linux partition (to get under the 98M error thrown by dd) and write it. Now do

```
mke2fs /dev/sda1
```

just to keep it simple. What you wrote earlier indicates that this should succed. Then

```
mount -t ext2 /dev/sda1 /mnt/gentoo

cd /mnt/gentoo

dd if=/dev/urandom > somefilename
```

Now does this complete (the dd will take a while)? And if it does, what do you get when you reboot the CD?

If everything succeeds and you get a persistent partition, then what happens if you repeat and add a 80M /dev/sda2 filling it up as well?

Sorry if you think you've tried this before, it just isn't so clear to me what possibilities you've been through....

----------

## rgh

Phew! Let me tell you...that was ugly. Sorry for the long delay, I was away for a bit.

Tried what you recommended....booted to the Gentoo disk and fdisk'ed /dev/sda. Created a small partition of 80M and all seemed to go well. However, when I tried to format it using mke2fs, received dozens of the following errors (or reasonable facsimiles),

end_request: I/O error, dev sda, sector 2255

Buffer I/O error on device sda1, logical block 1096

ata1: command 0xca timeout, stat 0xd8 host_stat 0x61

ata1: translated ATA stat/err

which repeated over and over with increasing sector and logical block numbers and every once in a while, a

lost page write due to I/O error on sda1

would be thrown in for good measure. When I did a soft reboot to the CD, it stalled during boot at 'searching for sata_sil. A hard reboot allowed it to reboot to the CD.

The end result is that I still have my old partition with all of its data still in tact.

I think I am going to try and find somebody with sata capability and see if this drive works in their computer. If it does, I guess maybe it's my motherboard that is toast. If it doesn't, I will consider the drive toast, fetch a new one, extract the data from the old on to the new. and take my frustrations out on the old with a hammer. Unless you can think of anything else....

----------

## gsoe

Hmm, yes it smells a bit like some hardware fault. Is your other drive a SATA and does that work allright on the same controller? If that's the case, I would suspect the drive and not the motherboard, but I'm running out of things to try out. Maybe it's worth trying the manufacturers software, after all keep in mind that the longer you fight the drive, the more you are going to enjoy the final session with the hammer...

----------

## rgh

No, my other two drives are IDE drives and this was my first venture into SATA. I plan on upgrading everything in the fall and am still  planning on going SATA so I hope my luck with the second generation devices goes better.

And yes, you're right....the longer this goes on, the more I think I'm going to enjoy this drive's last day...... I have a 12 pound hammer at the ready. 

In the meantime, for what it's worth, while waiting to try it in someone else's machine, I am going to repeat some of the stuff I have already tried and and some of the stuff you suggested as well trying to locate Seagate's low level formatting software. 

Thanks again for your help.

----------

## quantumsummers

Hi there,

Just a quick little diddy.

```

emerge smartmontools

```

run:

```

$ smartctl -d ata -a /dev/sd?  #where ? is your drive letter.

```

__IF__ your drive & BIOS have SMART enable, which they should, this will tell you if the drive is toast.

Here is a nice guide from the folks at Gentoo-Wiki.

http://gentoo-wiki.com/HOWTO_Monitor_your_hard_disk%28s%29_with_smartmontools

If it is dead, bring the sledge!  Just remember to recycle the drive after you've smashed it.  As a side note the HD disk is very shiny &, if you have a few of them, make good wind chimes.

Cheers,

QuantumSummers

----------

## desultory

 *rgh wrote:*   

> In the meantime, for what it's worth, while waiting to try it in someone else's machine, I am going to repeat some of the stuff I have already tried and and some of the stuff you suggested as well trying to locate Seagate's low level formatting software.

 It is available from their support site.

----------

## jburns

Does the drive have a host protected area?  Is the BIOS protecting the MBR?  See http://en.wikipedia.org/wiki/Host_Protected_Area, http://lkml.org/lkml/2006/6/10/53, and http://lkml.org/lkml/2006/6/9/56.

----------

## rgh

Thanks. Found it.

So when using SeaTools under Linux or WIndows, the diagnostic tests would not complete due to errors on the drive. Then used their DOS boot disk which allowed the tests to finish but with about 70 sector errors. I'm going to run it again, let it try to repair them and see what happens. But, sadly,  I think I have answered my own question....my drive is toast.  :Sad: 

----------

## rgh

Sorry.Missed a few posts.

jburns....as far as I know, there are no protected areas.

quantum.....tried smatctl. Although it said this drive had good health, it also revealed the following errors in the error log:

smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART Error Log Version: 1

ATA Error Count: 30 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 30 occurred at disk power-on lifetime: 10934 hours (455 days + 14 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 2f 26 91 4e  Error: UNC at LBA = 0x0e91262f = 244393519

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 2f 26 91 4e 00      00:25:23.030  READ DMA

  c8 00 08 27 26 91 4e 00      00:25:23.026  READ DMA

  c8 00 08 1f 26 91 4e 00      00:25:23.026  READ DMA

  c8 00 08 ef 0b 91 4e 00      00:25:23.019  READ DMA

  c8 00 08 97 25 91 4e 00      00:25:23.017  READ DMA

Error 29 occurred at disk power-on lifetime: 8281 hours (345 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 07 96 ec f3 46  Error: UNC 7 sectors at LBA = 0x06f3ec96 = 116649110

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 8f ec f3 46 00      08:22:04.840  READ DMA

  ca 00 08 37 00 5e 40 00      08:22:04.840  WRITE DMA

  c8 00 08 8f ec f3 46 00      08:22:04.816  READ DMA

  ca 00 08 3f 00 5e 40 00      08:22:04.816  WRITE DMA

  c8 00 01 00 00 00 40 00      08:22:01.150  READ DMA

Error 28 occurred at disk power-on lifetime: 8281 hours (345 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 07 96 ec f3 46  Error: UNC 7 sectors at LBA = 0x06f3ec96 = 116649110

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 8f ec f3 46 00      08:22:04.840  READ DMA

  ca 00 08 3f 00 5e 40 00      08:22:04.840  WRITE DMA

  c8 00 01 00 00 00 40 00      08:22:04.816  READ DMA

  ca 00 18 97 73 5f 40 00      08:22:04.816  WRITE DMA

  c8 00 08 8f ec f3 46 00      08:22:01.150  READ DMA

Error 27 occurred at disk power-on lifetime: 8281 hours (345 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 07 96 ec f3 46  Error: UNC 7 sectors at LBA = 0x06f3ec96 = 116649110

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 8f ec f3 46 00      08:21:53.668  READ DMA

  c8 00 01 00 00 00 40 00      08:21:53.655  READ DMA

  c8 00 08 87 ec f3 46 00      08:21:53.655  READ DMA

  ca 00 08 2f 00 5e 40 00      08:21:53.655  WRITE DMA

  c8 00 01 00 00 00 40 00      08:22:01.150  READ DMA

Error 26 occurred at disk power-on lifetime: 8281 hours (345 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 0f 96 ec f3 46  Error: UNC 15 sectors at LBA = 0x06f3ec96 = 116649110

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 87 ec f3 46 00      08:21:53.668  READ DMA

  ca 00 08 47 00 5e 40 00      08:21:53.655  WRITE DMA

  ca 00 08 57 33 8c 45 00      08:21:53.655  WRITE DMA

  ca 00 20 77 73 5f 40 00      08:21:53.655  WRITE DMA

  c8 00 08 8f ec f3 46 00      08:21:53.655  READ DMA

----------

