# HW problem I am afraid[Solved]

## apiaio

Last startup of Gentoo installed on ssd did not start X server. Console message: *Quote:*   

> This is (none).unknown_domain(Linux x86_64 4.1.15-gentoo-r1)

 dmesg *Quote:*   

> ...[sda]tag#0 FILED result:hostbyte=DID_OK...
> 
> [sda]tag#0 Sense key: medium Error [current][descriptor]
> 
> [sda]tag#0Add.Sense:Unrecovered readerror-auto rellocate failed...etc

 When trying to write something on the ssd *Quote:*   

> read-only file system

 

Does it mean that live of my ssd is over? 

How could I check this ssd?Last edited by apiaio on Sat Jul 09, 2016 3:04 pm; edited 1 time in total

----------

## frostschutz

smartctl -a /dev/sda?

----------

## apiaio

 *frostschutz wrote:*   

> smartctl -a /dev/sda?

 

```
sabayonx86-64 miro # smartctl -a /dev/sda

bash: smartctl: command not found

```

Just now I am booted in Sabayon installed on sdb

----------

## frostschutz

it's part of smartmontools   :Rolling Eyes: 

----------

## The Doctor

So install sys-apps/smartmontools  :Wink: 

Reading the documentation on it might also help since this is one of those tools you really want to have on all your systems.

----------

## apiaio

Well.

I have swiched to other gentoo installed on sdc and installed sys-apps/smartmontools. 

BTW  *Quote:*   

> localhost miro # emerge smartmontools -vp
> 
>  * Last emerge --sync was 2y 168d 2h 26m 38s ago.

 

```
localhost miro # smartctl -a /dev/sda

smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.10.25-gentoo] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Indilinx Barefoot based SSDs

Device Model:     Corsair CSSD-V32GB2

Serial Number:    1106650500FF10200281

Firmware Version: 2.2

User Capacity:    32,017,047,552 bytes [32.0 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

Local Time is:    Fri Jul  8 18:34:02 2016 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (    0) seconds.

Offline data collection

capabilities:                    (0x1d) SMART execute Offline immediate.

                                        No Auto Offline data collection support.

                                        Abort Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        No Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x00) Error logging NOT supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (   0) minutes.

Extended self-test routine

recommended polling time:        (   0) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x0000   ---   ---   ---    Old_age   Offline      -       6

  9 Power_On_Hours          0x0000   ---   ---   ---    Old_age   Offline      -       5808

 12 Power_Cycle_Count       0x0000   ---   ---   ---    Old_age   Offline      -       8158

184 Initial_Bad_Block_Count 0x0000   ---   ---   ---    Old_age   Offline      -       28

195 Program_Failure_Blk_Ct  0x0000   ---   ---   ---    Old_age   Offline      -       1

196 Erase_Failure_Blk_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       0

197 Read_Failure_Blk_Ct     0x0000   ---   ---   ---    Old_age   Offline      -       3

198 Read_Sectors_Tot_Ct     0x0000   ---   ---   ---    Old_age   Offline      -       2026892145

199 Write_Sectors_Tot_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       1410713922

200 Read_Commands_Tot_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       27692167

201 Write_Commands_Tot_Ct   0x0000   ---   ---   ---    Old_age   Offline      -       16148660

202 Error_Bits_Flash_Tot_Ct 0x0000   ---   ---   ---    Old_age   Offline      -       6937082

203 Corr_Read_Errors_Tot_Ct 0x0000   ---   ---   ---    Old_age   Offline      -       6287996

204 Bad_Block_Full_Flag     0x0000   ---   ---   ---    Old_age   Offline      -       0

205 Max_PE_Count_Spec       0x0000   ---   ---   ---    Old_age   Offline      -       5000

206 Min_Erase_Count         0x0000   ---   ---   ---    Old_age   Offline      -       691

207 Max_Erase_Count         0x0000   ---   ---   ---    Old_age   Offline      -       3676

208 Average_Erase_Count     0x0000   ---   ---   ---    Old_age   Offline      -       1666

209 Remaining_Lifetime_Perc 0x0000   ---   ---   ---    Old_age   Offline      -       67

211 SATA_Error_Ct_CRC       0x0000   ---   ---   ---    Old_age   Offline      -       0

212 SATA_Error_Ct_Handshake 0x0000   ---   ---   ---    Old_age   Offline      -       0

213 Indilinx_Internal       0x0000   ---   ---   ---    Old_age   Offline      -       0

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.

SMART Error Log Version: 1

No Errors Logged

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

```

----------

## NeddySeagoon

apiaio,

```
Error logging capability:        (0x00) Error logging NOT supported.

                                        General Purpose Logging supported. 
```

There is nothing useful there. 

Run the long self test and look at the results.  That's the same as reading the entire content of the drive to /dev/null, except its all internal to the drive.

 *Quote:*   

> [sda]tag#0 Sense key: medium Error [current][descriptor]
> 
> [sda]tag#0Add.Sense:Unrecovered readerror-auto rellocate failed...etc

 suggests a read failed and its internal to the drive.

However, cheap poor quality SATA data cables have been known to cause similar effects and an STA cable is much lower cost than an SSD.

----------

## Logicien

Can you read and write with Sabayon on the Gentoo defective ssd? Like mount the filesystems in the partitions and access the data?

----------

## apiaio

 *Logicien wrote:*   

> Can you read and write with Sabayon on the Gentoo defective ssd? Like mount the filesystems in the partitions and access the data?

 Yes

----------

## NeddySeagoon

apiaio,

That you can read/write elsewhere on the filesystem suggests that the SATA data cable is OK.

What did the smartctl long test tell?

dmesg should have given you a block number for the error.

Can you read that block with dd?

----------

## apiaio

 *NeddySeagoon wrote:*   

> apiaio,
> 
> That you can read/write elsewhere on the filesystem suggests that the SATA data cable is OK.
> 
> What did the smartctl long test tell?
> ...

 I am not sure if I use smartctl command correctly

```
localhost gen # smartctl —test=long /dev/sda

smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.10.25-gentoo] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl takes ONE device name as the final command-line argument.

You have provided 2 device names:

—test=long

/dev/sda

Use smartctl -h to get a usage summary

```

```
localhost gen # dmesg|grep sda

[    0.777125] sd 0:0:1:0: [sda] 62533296 512-byte logical blocks: (32.0 GB/29.8 GiB)

[    0.777302] sd 0:0:1:0: [sda] Write Protect is off

[    0.777307] sd 0:0:1:0: [sda] Mode Sense: 00 3a 00 00

[    0.777333] sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.777651]  sda: sda1

[    0.777888] sd 0:0:1:0: [sda] Attached SCSI disk

[  124.823372] EXT3-fs (sda): error: can't find ext3 filesystem on dev sda.

[  124.823505] EXT4-fs (sda): VFS: Can't find ext4 filesystem

[  124.823638] EXT4-fs (sda): VFS: Can't find ext4 filesystem

[  124.823762] FAT-fs (sda): bogus number of FAT structure

[  124.823765] FAT-fs (sda): Can't find a valid FAT filesystem

[  124.823879] FAT-fs (sda): bogus number of FAT structure

[  124.823880] FAT-fs (sda): Can't find a valid FAT filesystem

[  124.824104] EXT3-fs (sda): error: can't find ext3 filesystem on dev sda.

[  124.824218] EXT4-fs (sda): VFS: Can't find ext4 filesystem

[  124.824356] EXT4-fs (sda): VFS: Can't find ext4 filesystem

[  124.824483] FAT-fs (sda): bogus number of FAT structure

[  124.824485] FAT-fs (sda): Can't find a valid FAT filesystem

[  124.824606] FAT-fs (sda): bogus number of FAT structure

[  124.824608] FAT-fs (sda): Can't find a valid FAT filesystem

[  124.835542] NTFS-fs warning (device sda): is_boot_sector_ntfs(): Invalid boot sector checksum.

[  124.835545] NTFS-fs error (device sda): read_ntfs_boot_sector(): Primary boot sector is invalid.

[  124.835547] NTFS-fs error (device sda): read_ntfs_boot_sector(): Mount option errors=recover not used. Aborting without trying to recover.

[  124.835550] NTFS-fs error (device sda): ntfs_fill_super(): Not an NTFS volume.

[  124.836138] XFS (sda): bad magic number

[  124.836156] XFS (sda): Internal error xfs_sb_read_verify at line 730 of file fs/xfs/xfs_mount.c.  Caller 0xffffffff812e2aa5

[  124.836207] XFS (sda): Corruption detected. Unmount and run xfs_repair

[  124.836244] XFS (sda): SB validate failed with error 22.

[  256.317421] EXT4-fs (sda1): warning: mounting fs with errors, running e2fsck is recommended

[  256.317760] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)

[  435.604341] EXT4-fs (sda1): re-mounted. Opts: commit=0

[  556.677658] EXT4-fs (sda1): error count: 1

[  556.677664] EXT4-fs (sda1): initial error at 1467917189: ext4_find_entry:1457: inode 1049072

[  556.677668] EXT4-fs (sda1): last error at 1467917189: ext4_find_entry:1457: inode 1049072

```

how to read block with dd?  :Embarassed: 

----------

## NeddySeagoon

apiaio,

The syntax is

```
-t TEST, --test=TEST
```

Notice one hyphen to introduce short options and two for long options.

Your message shows that smartctl did not understanh the command. 

```
[  256.317421] EXT4-fs (sda1): warning: mounting fs with errors, running e2fsck is recommended

[  256.317760] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) 
```

Don't run e2fsck. That can make a bad situation worse.  Do not mount the filesystem read write either.

Its a really bad idea to write to a damaged file system.

Make an image of the drive using ddrescue before you attempt any data recovery.

```
[  556.677664] EXT4-fs (sda1): initial error at 1467917189: ext4_find_entry:1457: inode 1049072 
```

inode 1049072 is a pointer to the damaged object in the filesystem.  It can be a directory or a file.

```
ls -Ri /mnt/point | grep 1049072
```

will recursively read all of the directories starting at   /mnt/point than print the name of the object using inode 1049072.

Its possible that inode 1049072 is used for filesystem metadata but that's a long way down the drive for anything but a backup superblock.

----------

## apiaio

Up to now I did

```
localhost miro # ddrescue -f -n /dev/sda1 /dev/sdc6 logfile

GNU ddrescue 1.16

Press Ctrl-C to interrupt

rescued:    32015 MB,  errsize:    4096 B,  current rate:   39055 kB/s

   ipos:    17213 MB,   errors:       1,    average rate:   67974 kB/s

   opos:    17213 MB,     time since last successful read:       0 s

Finished     
```

and

```
localhost miro # e2fsck -v -f /dev/sdc6

e2fsck 1.42.7 (21-Jan-2013)

Pass 1: Checking inodes, blocks, and sizes

Pass 2: Checking directory structure

Directory inode 1049072, block #0, offset 0: directory corrupted

Salvage<y>? yes

Missing '.' in directory inode 1049072.

Fix<y>? yes

Setting filetype for entry '.' in ??? (1049072) to 2.

Missing '..' in directory inode 1049072.

Fix<y>? yes

Setting filetype for entry '..' in ??? (1049072) to 2.

Pass 3: Checking directory connectivity

'..' in /home/miro/.mozilla/firefox/ci7aym0p.default/minidumps (1049072) is <The NULL inode> (0), should be /home/miro/.mozilla/firefox/ci7aym0p.default (1049066).

Fix<y>? yes

Pass 4: Checking reference counts

Inode 2 ref count is 26, should be 27.  Fix<y>? yes

Inode 1049066 ref count is 14, should be 13.  Fix<y>? yes

Pass 5: Checking group summary information

/dev/sdc6: ***** FILE SYSTEM WAS MODIFIED *****

      467941 inodes used (23.90%, out of 1957888)

         680 non-contiguous files (0.1%)

         177 non-contiguous directories (0.0%)

             # of inodes with ind/dind/tind blocks: 0/0/0

             Extent depth histogram: 454045/46

     2499673 blocks used (31.98%, out of 7816406)

           0 bad blocks

           1 large file

      412159 regular files

       41735 directories

         174 character device files

          97 block device files

           2 fifos

         244 links

       13765 symbolic links (13569 fast symbolic links)

           0 sockets

------------

      468176 files

```

What should be the next step? May I format sda1 and 

```
ddrescue -f -n /dev/sdc6 /dev/sda1 
```

?

----------

## NeddySeagoon

apiaio,

Your filesystem is self consistent. You have lost exactly one block.

However, it was 

```
Directory inode 1049072, block #0, offset 0: directory corrupted 
```

That means that all the files indexed from that directory block are no longer accessible.

We know that it was the first or only directory block in that directory as  

```
Missing '.' in directory inode 1049072.

Fix<y>? yes

Setting filetype for entry '.' in ??? (1049072) to 2.

Missing '..' in directory inode 1049072. 
```

That's the parent and this directory entries that are made my mkdir.

It looks like the damage is confined to /home/miro/.mozilla/firefox/ci7aym0p.default, which looks like user miros Firefox profile.

There is no need to format and restore your backup.  You will end up doing the fsck and getting to the same place you are now.

Do keep your backup.

If that were a conventional HDD, I would suspect mechanical problems and only use it for things I could afford to lose.

In effect, the drie can no longer read its own writing. However, SSDs are different. They don't have mechanical problems.

Its likely that only a single memory cell has failed, so that the drive cannot read the data to remap it.

One cell failing (there are four memory cells per byte) says nothing about any other cells.

I would keep using the drive and run the long test every few days to check for more errors. 

I might even make a new profile for Firefox by that's a pain as all your history, bookmarks and so on will vanish.

Unlike a HDD, you cannot force a sector remap by writing to the faulty sector, as SSDs do a remap on write anyway, to avoid the erase penalty.

----------

## apiaio

After fsck everything works again. Even the firefox.

Thanks

----------

