# Possibly Corrupted SSD

## frddbbbl

Hi, pretty much every time i boot my thinkpad openrc finds errors in /dev/sda4 and automatically runs fsck. Sometimes the errors need me to run fsck manually. I have backed up everything to a .img. I was wondering if this is possibly a software issue, maybe i dont need to buy a new ssd? Sorry for the lack of information, im not completely sure what is relavent to give you. Checking through dmesg i found this error which i'm not sure if it is relevant or not but seems worth mentioning:

```

[    0.090022] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address fed92000 returns all ones!

               BIOS vendor: coreboot; Ver: CBET4000 3774c98; Product Version: ThinkPad X200 Tablet

```

Any help would be much appriciated.

Cheers,

Freddie

----------

## sdauth

Maybe try to run a short smartctl test on your drive first.

smartctl -t short /dev/sda

(note, you might want also to run a long test)

then when it is done, post the output of :

smartctl -a /dev/sda

As for the DMAR issue, it seems you're using an old coreboot version (probably libreboot stable release I guess because that bug has been fixed with upstream coreboot), I'm not sure if that is related to the drive. What you could do is to test the drive on an other computer to see if the fsck issue triggers or not.

Might help to show your /etc/fstab as well.

----------

## NeddySeagoon

frddbbbl,

Run 

```
smartctl -x /dev/sda
```

and post the output.

The error 

```
[    0.090022] DMAR: [Firmware Bug]: Your BIOS is broken; 
```

is not related to the SSD.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> Run 
> 
> ```
> ...

 

The output is:

```

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.27-gentoo] (local build)

Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Phison Driven OEM SSDs

Device Model:     SATA SSD

Serial Number:    175707011E8100727585

Firmware Version: SBFM61.3

User Capacity:    120,034,123,776 bytes [120 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Form Factor:      2.5 inches

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ACS-4 (minor revision not indicated)

SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:    Wed Apr 28 12:13:53 2021 BST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

AAM feature is:   Unavailable

APM feature is:   Unavailable

Rd look-ahead is: Enabled

Write cache is:   Enabled

DSN feature is:   Unavailable

ATA Security is:  Disabled, NOT FROZEN [SEC1]

Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (65535) seconds.

Offline data collection

capabilities:                    (0x79) SMART execute Offline immediate.

                                        No Auto Offline data collection support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        (  30) minutes.

Conveyance self-test routine

recommended polling time:        (   6) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE

  1 Raw_Read_Error_Rate     -O-R--   100   100   000    -    0

  9 Power_On_Hours          -O--C-   100   100   000    -    1802

 12 Power_Cycle_Count       -O--C-   100   100   000    -    843

168 SATA_Phy_Error_Count    -O--C-   100   100   000    -    0

170 Bad_Blk_Ct_Erl/Lat      PO--C-   100   100   000    -    0/10

173 MaxAvgErase_Ct          ------   100   100   000    -    6 (Average 22)

192 Unsafe_Shutdown_Count   -O--C-   100   100   000    -    71

194 Temperature_Celsius     PO---K   067   067   000    -    33 (Min/Max 33/33)

218 CRC_Error_Count         ------   100   100   000    -    0

                            ||||||_ K auto-keep

                            |||||__ C event count

                            ||||___ R error rate

                            |||____ S speed/performance

                            ||_____ O updated online

                            |______ P prefailure warning

General Purpose Log Directory Version 1

SMART           Log Directory Version 1 [multi-sector log support]

Address    Access  R/W   Size  Description

0x00       GPL,SL  R/O      1  Log Directory

0x01           SL  R/O      1  Summary SMART error log

0x02           SL  R/O     51  Comprehensive SMART error log

0x03       GPL     R/O     64  Ext. Comprehensive SMART error log

0x04       GPL,SL  R/O      8  Device Statistics log

0x06           SL  R/O      1  SMART self-test log

0x07       GPL     R/O      1  Extended self-test log

0x09           SL  R/W      1  Selective self-test log

0x10       GPL     R/O      1  NCQ Command Error log

0x11       GPL     R/O      1  SATA Phy Event Counters log

0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log

0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (64 sectors)

No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0

Note: revision number not 1 implies that no selective self-test has ever been run

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Commands not supported

Device Statistics (GP Log 0x04)

Page  Offset Size        Value Flags Description

0x01  =====  =               =  ===  == General Statistics (rev 1) ==

0x01  0x008  4             843  ---  Lifetime Power-On Resets

0x01  0x010  4            1802  ---  Power-on Hours

0x01  0x018  6      1139754701  ---  Logical Sectors Written

0x01  0x028  6       682873645  ---  Logical Sectors Read

0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==

0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors

0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==

0x05  0x008  1              33  ---  Current Temperature

0x05  0x020  1              33  ---  Highest Temperature

0x05  0x028  1              33  ---  Lowest Temperature

0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==

0x06  0x008  4             642  ---  Number of Hardware Resets

0x06  0x018  4               0  ---  Number of Interface CRC Errors

0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==

0x07  0x008  1               0  ---  Percentage Used Endurance Indicator

                                |||_ C monitored condition met

                                ||__ D supports DSN

                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)

ID      Size     Value  Description

0x0001  2            0  Command failed due to ICRC error

0x0003  2            0  R_ERR response for device-to-host data FIS

0x0004  2            0  R_ERR response for host-to-device data FIS

0x0006  2            0  R_ERR response for device-to-host non-data FIS

0x0007  2            0  R_ERR response for host-to-device non-data FIS

0x0008  2            0  Device-to-host non-data FIS retries

0x0009  4            9  Transition from drive PhyRdy to drive PhyNRdy

0x000a  4            1  Device-to-host register FISes sent due to a COMRESET

0x000f  2            0  R_ERR response for host-to-device data FIS, CRC

0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC

0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC

0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

```

My fstab is 

```

  /dev/sda2             /boot           ext4            defaults,noatime,discard 0 2

  /dev/sda3             none            swap            sw                       0 0

  /dev/sda4             /               ext4            noatime,discard          0 1

```

And yes, it would make sense the bios error is to do with libreboot, i bought this laptop second hand librebooted so it could be an old version.

----------

## NeddySeagoon

frddbbbl,

The SMART data says that internally, the SSD in fine. Its not seem much use with only 1800 hours on it.

This bit 

```
SATA Phy Event Counters (GP Log 0x11)

ID      Size     Value  Description

0x0001  2            0  Command failed due to ICRC error

0x0003  2            0  R_ERR response for device-to-host data FIS

0x0004  2            0  R_ERR response for host-to-device data FIS

0x0006  2            0  R_ERR response for device-to-host non-data FIS

0x0007  2            0  R_ERR response for host-to-device non-data FIS

0x0008  2            0  Device-to-host non-data FIS retries

0x0009  4            9  Transition from drive PhyRdy to drive PhyNRdy

0x000a  4            1  Device-to-host register FISes sent due to a COMRESET

0x000f  2            0  R_ERR response for host-to-device data FIS, CRC

0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC

0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC

0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC 
```

says that that there is an interface problem.

Is the SSD seated properly or loose in the laptop?

----------

## frddbbbl

I've personally never opened up the back of it but will check now, could have dropped it.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> The SMART data says that internally, the SSD in fine. Its not seem much use with only 1800 hours on it.
> 
> This bit 
> ...

 

I've opened up the back, from the top it looked realatively secure, i checked from the side and there was a tiny bit of wobble, but i just added a bit of electrical tape under the rubber feet that were already there so it's definetly not moving now, still had the same issure after turning it on, and the error occurs if i reboot it without moving it at all really.

----------

## sdauth

frddbbbl,

Are you using Intel microcode or not on your Thinkpad ?

What's the output of cat /proc/cpuinfo | grep microcode ?

By default, libreboot doesn't include it. But you can load it with linux instead. You will need linux-firmware package (yeah non-free) and also you will need to enable :

CONFIG_MICROCODE=y

CONFIG_MICROCODE_INTEL=y

in your kernel if that's not already enabled.

It is really recommended to make sure you're using the "latest" (2010 !) microcode on that x200 for stability. It might be totally not related to your issue but that's an easy thing to try. My x200 (regular, not tablet) was experiencing several kernel panic on various scenarios without it. Otherwise, I would recommend to clean all the SATA pins but of course it implies disassembling the whole thing..

----------

## Buffoon

The word "interface" implies there are two devices interacting. One is built into your drive. I'd take this drive out and try it elsewhere. NeddySeagoon may be right the storage part of the drive is fine, what about the communication part. Maybe warranty replacement is in order.

----------

## NeddySeagoon

frddbbbl,

The logical next step is to test the drive elsewhere and test the laptop with another drive, if you can.

I suspect that all well in those tests and if not you know where to look next.

With the connectors 'wiped' like that, it may be OK when when you put it back together too.

If none of that helps, we need to look at hardware and the errors from the PC side.

----------

## frddbbbl

Thanks everyone, i dont have any other hardware to test the drive on, however i will do a more though disassembly and check the pins etc on the drive, may get a new bigger ssd anyway...

----------

## NeddySeagoon

frddbbbl,

If you have a USB to SATA adaptor, that would do for testing.

Don't get a replacement SSD yet. Not until you know where the fault is.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> If you have a USB to SATA adaptor, that would do for testing.
> 
> Don't get a replacement SSD yet. Not until you know where the fault is.

 

I could easily get a USB to SATA adapter, however i dont have any other device to use for testing.

----------

## NeddySeagoon

frddbbbl,

To test, you need to change something in the interface.

If the drive works when attached via USB, the drive is probably OK. The inference is that the problem is with the laptop part of the interface, as that is not in use.

It would be good to confirm that with another drive but no matter.

As testing may be delayed, lets do some more analysis on the labtop side. SATA is/was made in three link speeds.

Your SATA SSD will be SATA3 because they all are.

Your laptop may be SAT1 or SATA2. Each faster standard is supposed to be backwards compatible with the older, slower standards. A few combinations don't work so well.

To check this out, please post the output of 

```
lspci -nnk
```

That will tell us all about your PCI bus hardware.

Put all of dmesg onto a pastebin site. Ideally we would like it to contain some errors so we can see what the error handler does.

Even if there are no errors, it will tell how the kernel set up its side of the SATA interface. We need to know that.

Its all good work without doing any testing.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> To test, you need to change something in the interface.
> 
> If the drive works when attached via USB, the drive is probably OK. The inference is that the problem is with the laptop part of the interface, as that is not in use.
> ...

 

```
 lspci -nnk

00:00.0 Host bridge [0600]: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub [8086:2a40] (rev 07)

        Subsystem: Lenovo ThinkPad T400 [17aa:20e0]

00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07)

        Subsystem: Lenovo Mobile 4 Series Chipset Integrated Graphics Controller [17aa:20e4]

        Kernel driver in use: i915

00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a43] (rev 07)

        Subsystem: Lenovo Mobile 4 Series Chipset Integrated Graphics Controller [17aa:20e4]

00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit Network Connection [8086:10f5] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20ee]

        Kernel driver in use: e1000e

00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 [8086:2937] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f0]

        Kernel driver in use: uhci_hcd

00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 [8086:2938] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f0]

        Kernel driver in use: uhci_hcd

00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 [8086:2939] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f0]

        Kernel driver in use: uhci_hcd

00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f1]

        Kernel driver in use: ehci-pci

00:1b.0 Audio device [0403]: Intel Corporation 82801I (ICH9 Family) HD Audio Controller [8086:293e] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f2]

        Kernel driver in use: snd_hda_intel

        Kernel modules: snd_hda_intel

00:1c.0 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 [8086:2940] (rev 03)

        Kernel driver in use: pcieport

00:1c.1 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 [8086:2942] (rev 03)

        Kernel driver in use: pcieport

00:1c.2 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 [8086:2944] (rev 03)

        Kernel driver in use: pcieport

00:1c.3 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 [8086:2946] (rev 03)

        Kernel driver in use: pcieport

00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f0]

        Kernel driver in use: uhci_hcd

00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 [8086:2935] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f0]

        Kernel driver in use: uhci_hcd

00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 [8086:2936] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f0]

        Kernel driver in use: uhci_hcd

00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f1]

        Kernel driver in use: ehci-pci

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 93)

00:1f.0 ISA bridge [0601]: Intel Corporation ICH9M-E LPC Interface Controller [8086:2917] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f5]

00:1f.2 SATA controller [0106]: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] [8086:2929] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f8]

        Kernel driver in use: ahci

00:1f.3 SMBus [0c05]: Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f9]

```

Thanks, herehttp://ix.io/3fDp is dmesg's output. Will purchase a sata cable and try to borrow someones laptop to do some testing once it arrives.

----------

## NeddySeagoon

frddbbbl,

```
00:1f.2 SATA controller [0106]: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] [8086:2929] (rev 03)

        Subsystem: Lenovo ThinkPad T400 [17aa:20f8]

        Kernel driver in use: ahci 
```

That's all correct.

All those 

```
[ 5412.606759] sd 0:0:0:0: [sda] Synchronizing SCSI cache

[ 5412.607238] sd 0:0:0:0: [sda] Stopping disk

[ 5412.608883] sd 0:0:0:0: [sda] Starting disk
```

 at 10ms intervals look wrong. Well, I've never seen that before.

It looks like power saving gone overboard.

```
[ 5615.488208] EXT4-fs error (device sda4): ext4_xattr_ibody_find:2178: inode #3944350: comm background: corrupted in-inode xattr
```

This filesystem is certainly corrupt. At least, that i-node is.

On the good news front' there are no SATA interface errors there.

The next step it to try to stop the 

```
[ 5411.889839] sd 0:0:0:0: [sda] Synchronizing SCSI cache

[ 5411.889902] sd 0:0:0:0: [sda] Stopping disk

[ 5412.023135] sd 0:0:0:0: [sda] Starting disk
```

It shouldn't cause any harm. Its just unusual. 

Do you use suspend to RAM or hibernate?

They can do some funny things. 

Its probably not possible to fix your corrupt i-node. Do *not* use fsck until you have a backup that you know you can read. It 'guesses' in the face of missing, corrupt or conflicting information. When it guesses incorrectly, it makes a bad situation worse, so fsck is a last ditch thing. 

Please pastebin your entire kernel .config file so we can try to work out what is causing the stop/start of the drive.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> fWell, I've never seen that before.
> 
> 

 

Oh no! that can't be good haha! Yes, i have been using 

```
hibernate-ram
```

 pretty much since i got the laptop, i have a backup .img from dd on a HDD from just before i made this post, will that also be corrupted due to the way dd copies stuff? if so how should i create a backup?

here is my kernel .config http://ix.io/3gfi

----------

## NeddySeagoon

frddbbbl,

Turn off all the DEBUG options in the kernel unless the kernel or you really want them on.

Debugging options are mostly for kernel developers. They are permitted to interfere with normal kernel operation too.

Debug options usually cause lots of logspam which slows things down.  

I suspect those messages will go away then. In particular CONFIG_PM_DEBUG=y and CONFIG_PM_SLEEP_DEBUG=y.

Nothing will change - just the messages will not be printed.

How useful your backup might be depends on how yon ran dd.

If you did it from within your live gentoo, so it was waking an image of itself, it won't be very useful at all. dd will have backed up that state of the HDD as it passed.

That is not self consistent due to the data in the kernel caches and all the open files. Files will be incomplete, directories will be incomplete ...

If you did a dd while booted somehow else, then it will be good. At least. It will reflect the state of the system when the backup was made. 

The index nodes (i-nodes) for short, record information about the things they index. Like the file name.

```
ls -i
```

will show i-node numbers beside file names.

We know that your inode #3944350 is corrupt but we don't knom the name of the file that belongs to it. If we did and that is the only problem, it could be fixed by replacing the file with a copy in a different i-node.

I don't know a command for that but I'm fairly sure it will involve traversing the entire directory tree to do it.  Stack Exchange suggests 

```
find / -inum <inode>
```

----------

## frddbbbl

Hi,

I don't know a command for that but I'm fairly sure it will involve traversing the entire directory tree to do it.  Stack Exchange suggests 

```
find / -inum <inode>
```

[/quote]

Hi, inode 3944350 is from my browsers cache, 

```
/home/freddie/.surf/cache/WebKitCache/Version 16/Records/E64D507D994FBD16C0F87638E709665A352B7F3F/Resource/48D0C87DE6205E444A12741C1CBC0835EADAA4DA

```

Would it be safe just to delete the cache? or would this not fix the issue? Currently reconfiguring the kernel w / o the debug options. 

Cheers,

Freddie

----------

## NeddySeagoon

frddbbbl,

That may very well fix it. Its safe to try anyway.

----------

## frddbbbl

I have tried to delete it, everything ele in the cache was deleted, however it wont let me delete it giving me the message 

```
rm: cannot remove '.surf/cache/WebKitCache/Version 16/Records/E64D507D994FBD16C0F87638E709665A352B7F3F/Resource/48D0C87DE6205E444A12741C1CBC0835EADAA4DA': Structure needs cleaning

```

----------

## NeddySeagoon

frddbbbl,

That may not be the limit of the filesystem damage.

Boot with the liveCD, or whatever your favourite boot media is.

Make a copy with dd, while the filesystem is not mounted.

Ensure dd completes. It will work with a damaged filesystem but not a faulty drive.

After dd completes, run fsck as far as the first error and report the error.

Do not let it make any changes to the filesystem at this stage.

----------

## Buffoon

Some hints here. If there is any data you haven't backed up you can mount the filesystem with ro,noload options (then even journal won't be written) and copy your data off. And when making image then ddrescue may be better option as it does not terminate on bad sectors - although I'm not sure how it handles SSD-s ...

----------

## frddbbbl

 *Buffoon wrote:*   

> Some hints here. If there is any data you haven't backed up you can mount the filesystem with ro,noload options (then even journal won't be written) and copy your data off. And when making image then ddrescue may be better option as it does not terminate on bad sectors - although I'm not sure how it handles SSD-s ...

 

Hi, i have nothing i can use as a livecd atm, ordered a usb on ebay to do that, have removed most of the debug kernel options now. are you saying that i can mount the filesystem with ro noload options from live cd? i can boot into recovery mode from the bios, would this still be a bad way to create a backup? Thanks for all the help so far

----------

## NeddySeagoon

frddbbbl,

Backing up a read only filesystem is OK but copy the files off it, do not make an image of the device as dd does.

The ro noload options will work from any other Linux except at boot time because some things on the working system must be read/write.

That is, they cannot be applied to the system being booted.

----------

## frddbbbl

Hi, i am a little unclear as to how i would turn my fs into read only while i was still on it, what is the process of mounting it, or do you just mean booting from the livecd?

----------

## NeddySeagoon

frddbbbl,

You can't do that. That's why you need a liveCD or some other boot media.

Start the alternative system, that will not use your actual install. Then you can mount your install inside it.

Just as you did when you installed Gentoo.

Don't make any partitions or filesystems. Just do the mount step. You will not even chroot into your gentoo.

----------

## frddbbbl

Thanks, will reply again after the usb i have ordered arrives and i am able to back up and fsk.

----------

## NeddySeagoon

frddbbbl,

Thats OK, we will still be here. Gentoo is a hobby after all. :)

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> That may not be the limit of the filesystem damage.
> 
> Boot with the liveCD, or whatever your favourite boot media is.
> ...

 

NeddySeagoon,

I have created a backupfrom my liveUSB and ran fsk on the drive itself, and then seperatly its 4 partitions.

fsck /dev/sda returned:

```
Superblock invalid, trying backup blocks...
```

fsck /dev/sda1 returned:

```
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap

Suerblock has an invalid journal (inode 8)
```

dev/sda2 and dev/sda3 returned no errors and /dev/sda4 returned

```

/dev/sda4 contains a filesystem with errors, check forced.

...

...

/dev/sda4: 569646/7258112 files (0.2% non-contiguous), 6687123/29009279 blocks
```

Hope this is usefull.

Cheers,

Freddie

----------

## NeddySeagoon

frddbbbl,

fsck checks filesystems. The whole dirive does not contain a single filesystem, instead its partitioned and each partition may contain a filesystem. 

```
fsck /dev/sda ...

Superblock invalid, trying backup blocks...
```

is OK. There is no filesystem there.

```
/dev/sda1 

ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap

Suerblock has an invalid journal (inode 8)
```

looks bad. What type of filesystem is it and what in it used for?

If that's bios_boot, fsck is confused. There is no filesystem there, so its good.

If that's /boot be prepared to remake the filesystem, since it may not be fixable. 

What needs to be done, depends on what type of filesystem it is and what it is used for. 

```
/dev/sda4

/dev/sda4 contains a filesystem with errors, check forced.

...

...

/dev/sda4: 569646/7258112 files (0.2% non-contiguous), 6687123/29009279 blocks
```

That ran to completion but did it fix anything?

If you run 

```
fsck -f /dev/sda4
```

from the boot media is it error free now?

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> fsck checks filesystems. The whole dirive does not contain a single filesystem, instead its partitioned and each partition may contain a filesystem. 
> 
> ```
> ...

 

Hi, there appears to now be no errors on /dev/sda4? i ran fsk -f /dev/sda4 and it was clean. sda1 should be my bios_boot partition as /sda2 is my /boot

----------

## NeddySeagoon

frddbbbl,

It looks like you are good.

The bios_boot partition does not contain a filesystem. grub uses it raw.

sda4 is your root partition.

Boot normally and see what happens.

Can you delete your browser cache now?

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> It looks like you are good.
> 
> The bios_boot partition does not contain a filesystem. grub uses it raw.
> ...

 

I can yes! not entirely sure what has changed but the errors seem to have gone now!

----------

## NeddySeagoon

frddbbbl,

fsck 'fixed' your filesystem.

If fsck really got in a mess there will be lots of fragments in /lost+found

As long as its empty and everything else looks OK, you are good.

There has been no sign of any interface errors, other than in your original smatctl output so I think all is well.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> fsck 'fixed' your filesystem.
> 
> If fsck really got in a mess there will be lots of fragments in /lost+found
> ...

 

Hi, so the error still occurs, sometimes, on boot. it's hard for me to work out what the error is as fsck just says it has found a filesystem with errors (sda4) and then fixes it, so i cant investigate from the liveusb. Any ideas on anyother ways i could test? i cant seem to find a method that will consistently repeat the error

----------

## NeddySeagoon

frddbbbl,

"sometimes" is bad news. As software is deterministic, there is no "sometimes".

Something corrupts the filesystem. There may be something in dmesg when it happens but even if there is, you may not ever see it.

Try not using hibernate and/or suppend to RAM. That's based on a hunch that its a shutdown thing causing the problem.

If that improves the situation, try a newer kernel.

You could also just update the kernel and see what happens. Only change one thing at a time or you won't know what actually fixed it, if its fixed.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> "sometimes" is bad news. As software is deterministic, there is no "sometimes".
> 
> Something corrupts the filesystem. There may be something in dmesg when it happens but even if there is, you may not ever see it.
> ...

 

I am already on kernel 5.10.27, i updated a few weeks ago and the issue was present before then, this morning i have removed the hibernate and suspend to ram options and the issue still occurs, it appears to occur either after the laptop has been running for an extended period, or when it has been turned off for an extended period. it's hard to say which.

----------

## NeddySeagoon

frddbbbl,

I installed kernel 5.12.0 today. I don't know if it works yet :)

After the fs corruption has happened, the filesystem will stay corrupt until its fixed.

How old is the battery and do you need the battery?

I have seen old faulty batteries interfere with normal operation.

Removing the battery and using the laptop without one fixes the problem.

----------

## frddbbbl

I think the battery is quite old, intresting you should mention becuase i ordered a new one yesterday! acpi -ib tells me that the battery is at 68% of its original full capacity 3581/5252 mAh

----------

## NeddySeagoon

frddbbbl,

The battery will have a manufacturing date on one of the labels. Its normally wwyy format, where yy is the last two digits of the year and ww is the week in the year.

Don't buy cheap horrible noname lithium batteries. They often use poor quality cells and are poorly manufactured too.

If it looks too good to be true, it probably is.

----------

## frddbbbl

Hi, thanks for the help, i've purchased a powerose replacement battery for my thinkpad model, they seem to be reputable on there ebay store. I acctually cant find any date anywhere on the battery currently in my laptop. And this is probably a bit of a basic question but i assumed my kernel vresion was the most up to date? is it because i am using the stable as opposed to the bleeding edge gentoo-sources?

----------

## NeddySeagoon

frddbbbl,

Yes. 

Gentoo stable kernels track kernel.org Long Term Support (LTS) kernels will some lag for testing by Gentoo.

I update my kernel once every x for x in 5.x.y. My PC is 12 years old now, so I don't need new kernel features :)

Skipping 3 or 4 x values is a big jump for make oldconfig, so I try not to do that.

----------

## frddbbbl

 *NeddySeagoon wrote:*   

> frddbbbl,
> 
> Yes. 
> 
> Gentoo stable kernels track kernel.org Long Term Support (LTS) kernels will some lag for testing by Gentoo.
> ...

 

Hi, i've been using the new battery  and had hibernate turned off pretty much since my last post. still getting the "contains a filesystem with errors" message on start up but whenever i run fsck from a livecd it returns no errors. Would the next step to be to update my kernel?

----------

## NeddySeagoon

frddbbbl,

Either fsck is not fixing the filesystem or it keeps getting corrupted.

Look in dmesg to see what the problem is. If its always the same problem in the same place, I suspect that fsck is not fixing the problem.

All thats is left is to destroy the filesystem that remake it and restore the content from backup.

Not a dd backup. That probably contains the error. Only backup and restore the files, not the filesystem metadata.

----------

