# /dev/md0 nach jedem Neustart Dateisystem defekt

## Tinitus

Hallo,

nach jedem Neustart ist mein Softwareraid nur readonly gemounted, weil Dateisystemfehler zu finden sind.

Ich verwende ext3. Mounten tue ich über /etc/fstab.

Eintag in der fstab

```

/dev/md0      /home           ext3            noatime         0 1
```

Was kann man da machen, damit das besser klappt.

G. R.

----------

## tamiko

Bist du dir sicher, dass nicht evtl. eine der Festplatten defekt ist?

Lass mal einen SMART-Selbsttest laufen (smartmontools) und schau mal in deinem Kernellog nach, ob dort verdächtige Einträge stehen.

----------

## Tinitus

 *tamiko wrote:*   

> Bist du dir sicher, dass nicht evtl. eine der Festplatten defekt ist?
> 
> Lass mal einen SMART-Selbsttest laufen (smartmontools) und schau mal in deinem Kernellog nach, ob dort verdächtige Einträge stehen.

 

Hallo ich denke nicht das da Fehler drauf sind, oder?:

```
smartctl -a /dev/sde

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Smartctl open device: /dev/sde failed: No such file or directory

Linuxserver ~ # smartctl -a /dev/sdc

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     SAMSUNG HD103UJ

Serial Number:    S13PJDWQ306857

Firmware Version: 1AA01109

User Capacity:    1.000.204.886.016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   8

ATA Standard is:  ATA-8-ACS revision 3b

Local Time is:    Sat Mar 14 11:41:06 2009 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00)   Offline data collection activity

               was never started.

               Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (11658) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     ( 195) minutes.

Conveyance self-test routine

recommended polling time:     (  21) minutes.

SCT capabilities:           (0x003f)   SCT Status supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail  Always       -       7870

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       154

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1824

 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       153

 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0

183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   078   068   000    Old_age   Always       -       22 (Lifetime Min/Max 14/22)

194 Temperature_Celsius     0x0022   077   068   000    Old_age   Always       -       23 (Lifetime Min/Max 14/24)

195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       118844112

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 0

Warning: ATA Specification requires self-test log structure revision number = 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1

SMART Selective self-test log data structure revision number 0

Warning: ATA Specification requires selective self-test log data structure revision number = 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

G. R.

----------

## Tinitus

So ich habe jetzt mal folgendes gemacht:

```

 dd if=/dev/md0 of=/dev/null

1953519872+0 Datensätze ein

1953519872+0 Datensätze aus

1000202174464 Bytes (1,0 TB) kopiert, 10850,7 s, 92,2 MB/s
```

Das heißt doch, daß es keine Lesefehler gibt oder?

Im Log habe ich folgendes gefunden:

```

Jun 14 10:32:04 Linuxserver ext3_abort called.

Jun 14 10:32:04 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Jun 14 10:32:13 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543105 offset 0

Jun 14 10:32:34 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543164 offset 0

Jun 14 10:32:48 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543105 offset 0

Jun 14 10:33:20 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543105 offset 0

Jun 14 10:38:14 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0

Jun 14 10:39:24 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0

Jun 14 10:39:24 Linuxserver [<ffffffff802d279f>] ext3_count_free_inodes+0x2a/0x43

Jun 14 10:39:24 Linuxserver [<ffffffff802dad40>] ext3_commit_super+0x49/0x65

Jun 14 10:39:24 Linuxserver [<ffffffff802db85c>] ext3_handle_error+0x83/0xaa

Jun 14 10:39:24 Linuxserver [<ffffffff802db967>] ext3_error+0x83/0x90

Jun 14 10:39:24 Linuxserver [<ffffffff802d9186>] ext3_find_entry+0x413/0x5c4

Jun 14 10:39:24 Linuxserver [<ffffffff802daab5>] ext3_lookup+0x31/0x120

Jun 14 10:42:28 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260862

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261616

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261630

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260837

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260857

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261628

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261627

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261626

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261610

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261624

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260764

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260768

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261571

Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10543206

Jun 27 14:18:02 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10543348

Jul  7 10:31:23 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10543249

Jan 24 15:50:42 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 11010060

Jan 24 15:50:42 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10567834

Jan 24 15:50:42 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10567986

Feb  1 13:02:41 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 8503298

Feb  1 13:02:41 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 8503297

Feb  1 13:02:41 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 11010065

Feb 21 21:21:57 Linuxserver ext3_abort called.

Feb 21 21:21:57 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Feb 21 22:04:14 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Feb 21 22:04:14 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Feb 24 09:23:19 Linuxserver ext3_abort called.

Feb 24 09:23:19 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Feb 24 09:23:19 Linuxserver EXT3-fs error (device md0) in ext3_ordered_write_end: IO failure

Feb 24 09:28:17 Linuxserver ext3_abort called.

Feb 24 09:28:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Feb 24 09:29:43 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Feb 24 09:29:43 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Feb 25 07:23:15 Linuxserver ext3_abort called.

Feb 25 07:23:15 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Feb 25 07:25:53 Linuxserver ext3_abort called.

Feb 25 07:25:53 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Feb 25 07:25:56 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Feb 25 07:25:56 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Feb 26 08:59:09 Linuxserver ext3_abort called.

Feb 26 08:59:09 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Feb 26 09:29:13 Linuxserver ext3_abort called.

Feb 26 09:29:13 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Feb 26 09:29:38 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Feb 26 09:29:38 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Feb 28 09:28:01 Linuxserver ext3_abort called.

Feb 28 09:28:01 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Feb 28 09:34:51 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Feb 28 09:34:51 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Feb 28 09:36:54 Linuxserver ext3_abort called.

Feb 28 09:36:54 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  4 22:23:58 Linuxserver ext3_abort called.

Mar  4 22:23:58 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  5 12:57:50 Linuxserver ext3_abort called.

Mar  5 12:57:50 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  5 12:59:44 Linuxserver ext3_abort called.

Mar  5 12:59:44 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar  5 12:59:48 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Mar  5 12:59:48 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Mar  7 11:08:32 Linuxserver ext3_abort called.

Mar  7 11:08:32 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  7 11:10:10 Linuxserver ext3_abort called.

Mar  7 11:10:10 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar  9 09:54:51 Linuxserver ext3_abort called.

Mar  9 09:54:51 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  9 09:56:17 Linuxserver ext3_abort called.

Mar  9 09:56:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar 12 07:55:35 Linuxserver ext3_abort called.

Mar 12 07:55:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar 12 07:58:32 Linuxserver ext3_abort called.

Mar 12 07:58:32 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar 14 08:56:35 Linuxserver ext3_abort called.

Mar 14 08:56:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar 14 08:58:14 Linuxserver ext3_abort called.

Mar 14 08:58:14 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_truncate: Journal has aborted

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_orphan_del: Journal has aborted

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_delete_inode: Journal has aborted

Mar 14 11:50:32 Linuxserver ext3_abort called.

Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0): ext3_journal_st

```

bedeutet das, daß nur mein ext3 Journal defekt ist?

----------

## Tinitus

So jetzt habe ich mal:

```
fsck.ext3 -p -v -c /dev/md0

/dev/md0: stelle das Journal wieder her

  424357 inodes used (0.70%)

   30345 non-contiguous inodes (7.2%)

         # von Inodes mit ind/dind/tind Blöcken: 83306/18000/21

179961114 blocks used (73.70%)

       0 bad blocks

      56 large files

  387304 regular files

   36998 directories

       0 character device files

       0 block device files

       2 fifos

       0 links

      43 symbolic links (43 fast symbolic links)

       1 socket

--------

  424348 files

```

laufen lassen.

Jetzt checke ich auf badblocks von /dev/md0...

macht das überhaupt Sinn, oder sollte ich lieber die Platten testen?

Kennt sich da jemand aus?

G. R.

----------

## tamiko

Mhm.

Zu allererst folgendes: Erfolgreich von einem Raid lesen heißt nicht, dass beide Platten in Ordnung sind. Daher müsstest du badblocks direkt auf die Platten loslassen.

Mit Selbsttest meinte ich eigentlich 

```
smartctl -t ...
```

 und dann nach x Std. nachschauen. *pfeif*

(Wenn dies tatsächlich einen Fehler liefert, ist die Platte hin. Falls dies durchläuft kann man hoffen, dass die Platte in Ordnung ist.)

Zu den Fehlermeldungen: Das sieht nicht gut aus. 

Die Dateisystem-Fehler werden nicht korrigiert, wegen IO-Fehlern.

```
Mar  7 11:08:32 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  7 11:10:10 Linuxserver ext3_abort called.

Mar  7 11:10:10 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar  9 09:54:51 Linuxserver ext3_abort called.

Mar  9 09:54:51 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar  9 09:56:17 Linuxserver ext3_abort called.

Mar  9 09:56:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar 12 07:55:35 Linuxserver ext3_abort called.

Mar 12 07:55:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar 12 07:58:32 Linuxserver ext3_abort called.

Mar 12 07:58:32 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal

Mar 14 08:56:35 Linuxserver ext3_abort called.

Mar 14 08:56:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Mar 14 08:58:14 Linuxserver ext3_abort called. 
```

Dies spricht für einen Hardwaredefekt.

/edit:

Wenn man nach "ext3_abort called" sucht, findet man häufiger Hardwaredefekts a la "Controller defekt" oder "Kabel defekt." Vllt. möchtest du diese möglichen Fehlerquellen ausschließen.

Bitte man auf jeden Fall ersteinmal einen SMART-Selbsttest. Ich vermute, dass dieser bei einer Platte nicht erfolgreich durchlaufen wird.

----------

## Tinitus

 *tamiko wrote:*   

> Mhm.
> 
> Zu allererst folgendes: Erfolgreich von einem Raid lesen heißt nicht, dass beide Platten in Ordnung sind. Daher müsstest du badblocks direkt auf die Platten loslassen.
> 
> Mit Selbsttest meinte ich eigentlich 
> ...

 

Hi ich habe jetzt seit ca. 10 Stunden den smartd laufen gehabt. Keine Hinweise, außer Temperaturmeldungen.

reicht das auch?

Danke schon mal!

G. R.

----------

## tamiko

Nein das reicht nicht.

Mach mir den gefallen und starte über Nacht ein

```
smartctl --test=long /dev/*
```

 auf allen Platten. Danach kannst du am nächsten Tag via

```
smartctl -a
```

nachschauen, wie der Selbsttest verlaufen ist.

----------

## Scorpion_DE

Hi,

entweder ich habe es übersehen oder es wurde noch nicht gesagt: was ist das für ein RAID-Level (0,1)? Wieviele Disks gehören zu md0 - nur /dev/sde? Die Ausgabe von "cat /proc/mdstat" wäre für mich auch interessant.

Handelt es sich um RAID1, dann wäre auch bei Ausfall einer Disk die Integrität des darüberliegenden Dateisystems (bei dir ext3) nicht gefährdet. Bei einem Stripe (RAID0) und dem Ausfall einer Disk bzw. von Teilen, würde ich eher erwarten, daß du mit dem Dateisystem überhaupt nichts mehr anfangen kannst.

Als weitere Ursachen kämen in Betracht:

- Temperaturprobleme der Disks (Raptoren oder VelociRaptoren in schlecht belüfteten Gehäusen z.B.)

- Minderwertige Kabel

- Aggressiv übertaktetes System

- Aggressive oder exotische CFLAGS

- Sonstige Probleme mit Mainboard, Speicher, CPU

- Im Kernel falschen Treiber für Disk Controller gewählt

Gruß Scorpion

----------

## hitachi

Du kannst auch als root mal folgendes machen:

```
echo check >> /sys/block/md0/md/sync_action
```

Dabei kannst Du auf einer zweiten Konsole mit watch cat /proc/mdstad zuscheuen. Danach dann das wichtige:

```
cat /sys/block/md0/md/mismatch_cnt
```

Da müsste es auch irgendwo ein man zu geben. Bei mir Raid 5 dauert das für 50 GB etwa 8 Minuten. Das sollte man recht regelmäßig machen. Leider dauert das dann bei größeren Partitionen entsprechend länger. Wenn man gleichzeitig viel auf die Festplatte zugreift wird dann alles noch länger.

----------

## Tinitus

 *tamiko wrote:*   

> Nein das reicht nicht.
> 
> Mach mir den gefallen und starte über Nacht ein
> 
> ```
> ...

 

So dann man die Ausgaben:

```
# smartctl -a /dev/sdc

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     SAMSUNG HD103UJ

Serial Number:    S13PJDWQ306857

Firmware Version: 1AA01109

User Capacity:    1.000.204.886.016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   8

ATA Standard is:  ATA-8-ACS revision 3b

Local Time is:    Thu Mar 19 09:22:56 2009 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00)   Offline data collection activity

               was never started.

               Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (11658) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     ( 195) minutes.

Conveyance self-test routine

recommended polling time:     (  21) minutes.

SCT capabilities:           (0x003f)   SCT Status supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail  Always       -       7710

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       158

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       13088

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1933

 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       157

 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0

183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   076   068   000    Old_age   Always       -       24 (Lifetime Min/Max 21/27)

194 Temperature_Celsius     0x0022   076   068   000    Old_age   Always       -       24 (Lifetime Min/Max 21/28)

195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       147807954

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 0

Warning: ATA Specification requires self-test log structure revision number = 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      1930         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1

SMART Selective self-test log data structure revision number 0

Warning: ATA Specification requires selective self-test log data structure revision number = 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

und die 2 Festplatte:

```

 smartctl -a /dev/sdd

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     SAMSUNG HD103UJ

Serial Number:    S13PJDWQ306856

Firmware Version: 1AA01109

User Capacity:    1.000.204.886.016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   8

ATA Standard is:  ATA-8-ACS revision 3b

Local Time is:    Thu Mar 19 09:24:23 2009 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00)   Offline data collection activity

               was never started.

               Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (11429) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     ( 191) minutes.

Conveyance self-test routine

recommended polling time:     (  20) minutes.

SCT capabilities:           (0x003f)   SCT Status supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail  Always       -       7610

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       153

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       2

  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       10921

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1932

 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       152

 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0

183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   075   066   000    Old_age   Always       -       25 (Lifetime Min/Max 21/28)

194 Temperature_Celsius     0x0022   075   067   000    Old_age   Always       -       25 (Lifetime Min/Max 21/29)

195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       61469876

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 0

Warning: ATA Specification requires self-test log structure revision number = 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      1926         -

# 2  Extended offline    Aborted by host               90%      1923         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1

SMART Selective self-test log data structure revision number 0

Warning: ATA Specification requires selective self-test log data structure revision number = 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
```

Was mir aber nach dem neuerlichen Problem aufgefallen ist:

```

 fsck.ext3 -p -v -c /dev/md0

/dev/md0: stelle das Journal wieder her

/dev/md0: Der Zeitpunkt des letzten Einhängens von SuperBlock liegt in der Zukunft REPARIERT.

/dev/md0: Updating bad block inode.

/dev/md0: Inode 8, i_Blocks ist 262416, sollte sein 256272.  REPARIERT.

  419656 inodes used (0.69%)

   30239 non-contiguous inodes (7.2%)

         # von Inodes mit ind/dind/tind Blöcken: 82340/18001/21

181120987 blocks used (74.17%)

       0 bad blocks

      56 large files

  382057 regular files

   37536 directories

       0 character device files

       0 block device files

       2 fifos

       0 links

      51 symbolic links (51 fast symbolic links)

       1 socket

--------

  419647 files
```

Mein System läuft aber mit ntpd. Die Zeit der Hardware Uhr ist fast identisch mit der Internetzeit

```
 hwclock --show

Do 19 Mär 2009 09:26:50 CET  -0.537553 Sekunden
```

Das kann doch nicht sein oder?

G. R.

----------

## mv

Das klingt so, als wenn aus irgendeinem Grund bei Dir clock (bzw. - benutzt Du baselayout2? - hwclock) nicht vor fsck ausgeführt wird.

----------

## Tinitus

 *mv wrote:*   

> Das klingt so, als wenn aus irgendeinem Grund bei Dir clock (bzw. - benutzt Du baselayout2? - hwclock) nicht vor fsck ausgeführt wird.

 

benutze keine 2er Version... jedenfalls nicht wissentlich!?

Was kann man da noch machen?

Bzw. wie kann ich das verifizieren?

G. R.

----------

## mv

Ganz einfach: Kommt die Meldung "Setting System Clock using Hardware Clock" (oder wie das bei baselayout-1 hieß - ist schon lange her) bevor er meldet, dass die Partition defekt sei?

----------

## Tinitus

 *mv wrote:*   

> Ganz einfach: Kommt die Meldung "Setting System Clock using Hardware Clock" (oder wie das bei baselayout-1 hieß - ist schon lange her) bevor er meldet, dass die Partition defekt sei?

 

Nein beim booten ist noch alles OK dann plötzlich nach ein paar Minuten kommt die Meldung, daß das Dateisystem im readonly Modus remounted wird.

Aber irgendwo muß man das doch in den Logfiles checken können, oder?

MfG

R. May

----------

## Tinitus

Vielleicht ist auch das ext3 im Kernel buggy? Kann das sein? Vielleicht kommt es ja nicht mit 1TB Partitionen klar? Obwohl ja für wesentlich mehr spezifiziert....

Vielleicht sollte ich mal ein anderes FS probieren? Reiserfs hatte ich bisher ...bloß das wird ja nun wohl nicht mehr weiterentwickelt, oder?

G. R.

----------

## mv

 *Tinitus wrote:*   

> Vielleicht ist auch das ext3 im Kernel buggy? Kann das sein?

 

Das halte ich für unwahrscheinlich.

 *Quote:*   

> Vielleicht kommt es ja nicht mit 1TB Partitionen klar? Obwohl ja für wesentlich mehr spezifiziert....

 

Bei einem 32-Bit Kernel solltest Du vielleicht die Configure-Option LBD (large block device) und LSF (large single file) aktivieren: Die findest Du unter "Enable Block Layer". Angeblich ist die kritische Grenze dafür zwar erst bei 2 Terrabyte und nicht schon bei einem, aber da könnte ich mir vorstellen, dass irgendwo versehentlich mit "signed" statt "unsigned" gerechnet wurde.

----------

## hitachi

hast Du mal im dead.letter oder mit cat /proc/mdstat geschaut was er über das raid sagt, wenn die Probleme aufgetaucht sind? was ist mit echo check >> /sys/block/md0/md/sync_action && cat /sys/block/md0/md/mismatch_cnt (selbstverständlich musst Du mit dem zweiten warten bis der erste Befehl fertig ausgeführt wurde).

----------

## Tinitus

 *hitachi wrote:*   

> hast Du mal im dead.letter oder mit cat /proc/mdstat geschaut was er über das raid sagt, wenn die Probleme aufgetaucht sind? was ist mit echo check >> /sys/block/md0/md/sync_action && cat /sys/block/md0/md/mismatch_cnt (selbstverständlich musst Du mit dem zweiten warten bis der erste Befehl fertig ausgeführt wurde).

 

Hallo,

keine Probleme. Irgendwie hat er immer wieder Probleme mit dem Journal. Ich schaufel jetzt alles von den Raid Festplatten...dann mache ich ein neues Dateisystem drauf.

G. R.

----------

