# harddrive dma reset issues

## GenKreton

```
hda: dma_timer_expiry: dma status == 0x21

hda: DMA timeout error

hda: dma timeout error: status=0xd0 { Busy }

ide: failed opcode was: unknown

hda: DMA disabled

ide0: reset: success
```

I get this very often on my laptop. Sometimes it happens quickly, sometimes it can take hours (maybe even days) before this happens. Some addidtional information:

```

aesir ~ # hdparm -I /dev/hda

/dev/hda:

ATA device, with non-removable media

        Model Number:       IC25N030ATMR04-0

        Serial Number:      MRG2E0KBFAMKRJ

        Firmware Revision:  MOAOAD0A

Standards:

        Used: ATA/ATAPI-6 T13 1410D revision 3a

        Supported: 6 5 4 3

Configuration:

        Logical         max     current

        cylinders       16383   65535

        heads           16      1

        sectors/track   63      63

        --

        CHS current addressable sectors:    4128705

        LBA    user addressable sectors:   58605120

        LBA48  user addressable sectors:   58605120

        device size with M = 1024*1024:       28615 MBytes

        device size with M = 1000*1000:       30005 MBytes (30 GB)

Capabilities:

        LBA, IORDY(can be disabled)

        bytes avail on r/w long: 4      Queue depth: 1

        Standby timer values: spec'd by Vendor, no device specific minimum

        R/W multiple sector transfer: Max = 16  Current = 16

        Advanced power management level: 128 (0x80)

        Recommended acoustic management value: 128, current value: 254

        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5

             Cycle time: min=120ns recommended=120ns

        PIO: pio0 pio1 pio2 pio3 pio4

             Cycle time: no flow control=240ns  IORDY flow control=120ns

Commands/features:

        Enabled Supported:

           *    NOP cmd

           *    READ BUFFER cmd

           *    WRITE BUFFER cmd

           *    Host Protected Area feature set

           *    Look-ahead

           *    Write cache

           *    Power Management feature set

                Security Mode feature set

           *    SMART feature set

           *    FLUSH CACHE EXT command

           *    Mandatory FLUSH CACHE command

           *    Device Configuration Overlay feature set

           *    48-bit Address feature set

                Automatic Acoustic Management feature set

                SET MAX security extension

                Address Offset Reserved Area Boot

           *    SET FEATURES subcommand required to spinup after power up

                Power-Up In Standby feature set

           *    Advanced Power Management feature set

           *    General Purpose Logging feature set

           *    SMART self-test

           *    SMART error logging

Security:

        Master password revision code = 65534

                supported

        not     enabled

        not     locked

                frozen

        not     expired: security count

        not     supported: enhanced erase

        26min for SECURITY ERASE UNIT.

HW reset results:

        CBLID- above Vih

        Device num = 0 determined by the jumper

Checksum: correct

```

Also my default hdparm settings are as follows (feel free to correct them even if they are not the cause of the problem):

```
/dev/hda:

 multcount    = 16 (on)

 IO_support   =  0 (default 16-bit)

 unmaskirq    =  1 (on)

 using_dma    =  1 (on)

 keepsettings =  0 (off)

 readonly     =  0 (off)

 readahead    = 64 (on)

 geometry     = 16383/255/63, sectors = 30005821440, start = 0

```

----------

## GenKreton

If nobody can help, does anyone else know where this would be appropriate to get help on? I was leaning towards the kernel guys but I'm not sure.

----------

## widan

 *GenKreton wrote:*   

> 
> 
> ```
> hda: dma_timer_expiry: dma status == 0x21
> 
> ...

 

What happens is that your disk fails to respond in time to a command sent to it. After some time, Linux decides the drive is probably confused and resets the IDE bus. The reason it also disables DMA is that drives can become confused from a too high UDMA setting, and disabling DMA will make them work again (even if it is unlikely to be the case for you).

Are you trying to put the drive in standby mode for low power ? A drive in standby mode won't respond to ATA commands, and will need an IDE bus reset before it comes back to life. You can try to see what causes the errors with smartctl (emerge smartmontools if you don't have it):

```
smartctl -l error /dev/hda
```

If the list contains "STANDBY" or "STANDBY IMMEDIATE", then something asked the drive to go to standby mode.

----------

## GenKreton

I emerged it now, I guess I'll wait and check till next time I see it off

Also, I saw this in my hdparm output.

Configuration:

        Logical         max     current

        cylinders       16383   65535 

Is it bad that current is so much bigger than max?

----------

## widan

 *GenKreton wrote:*   

> I emerged it now, I guess I'll wait and check till next time I see it off.

 

You can run it now. The drive stores the last 5 or so errors in its internal memory.

 *GenKreton wrote:*   

> Is it bad that current is so much bigger than max?

 

No. Linux only uses LBA addressing. CHS drive geometry means nothing for modern drives, only the LBA values matter. The reason the values are set like they are is compatibility with some BIOSes.

----------

## GenKreton

```
aesir ~ # smartctl -l error /dev/hda

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART Error Log Version: 1

No Errors Logged

```

And I ran

```
smartctl -a /dev/hda

 [remove snippets]

Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

```

That's the only thing I could find wrong. I ran the long test as well and that reported no errors.

----------

## widan

Thinking of it again, it's "normal" that no error was logged. The drive (obviously) won't store an error record when it is in standby (assuming that's the cause)... In any case, it seems the drive is fine. You can try to disable any power management tools you might have loaded, and see if the "problem" disappears.

----------

## dashnu

I have the same issue. kernel-2.6.13 gentoo-sources.

Similar Drive:

```

ATA device, with non-removable media

powers-up in standby; SET FEATURES subcmd spins-up.

        Model Number:       IC35L060AVV207-0                        

        Serial Number:      VNVB30G8UL5JXH

        Firmware Revision:  V22OA66A

Standards:

        Used: ATA/ATAPI-6 T13 1410D revision 3a 

        Supported: 6 5 4 3 

Configuration:

        Logical         max     current

        cylinders       16383   65535

        heads           16      1

        sectors/track   63      63

        --

        CHS current addressable sectors:    4128705

        LBA    user addressable sectors:   78156288

        LBA48  user addressable sectors:   78156288

        device size with M = 1024*1024:       38162 MBytes

        device size with M = 1000*1000:       40016 MBytes (40 GB)

```

smart reports are ok.

This happens when my server is under a lot of load.

I do not have any power management set up.

I have tried a few kernels also with no luck.

----------

## SweD

I had this exakt same problem. What solved it for me was that I started noticing that my harddrives were running at a temp of approx 40-45 Celsius. Subjecting them to something intensive brought them up to the 50-55 mark. I kept having these DMA resets all the time, especially when I put more than one disk on the same channel of my pdc20265 onboard promise controller. I don't mean litterally all the time, but if the machine had been on for some time, these errors kept popping up more and more often.

I've since bought a new, substantially bigger chassi, put in extra fans to make sure thermal issues wouldn't be part of the problem, and presto.

My harddrives now run at 25-30 Celsius, and I have not had a single dma reset since. I mean that litterally, not a single one, and I've been using the new setup for approx 2 months, daily.

The conclusion can't be a general one, I'm sure, but for me it was a thermall issue through and through.

I had this issue ever since I put a second drive in, using something like gentoo-sources-2.4.20. I kept reading the osdl bugreports, specifically bug¤# 2494, where many people had the same issues, thinking it would be flaky drivers for the promise controller. It was, to some extent, but after they were "fixed", in some kernel version or another, the problems remained, until I fixed the temp.

Just my story, milage may well differ, of course.

Regards,

/Dennis

----------

## GenKreton

Thank you swed. Though I am using the 2.6 series it seems very possible it is the same problem. It only occured under high harddrive load situations and it is a laptop - so I lack the same ability to expand my chasis though I will try to take more efforts to keep it cool and see how it turns out.

----------

## SweD

Just for completeness, and to be clear, I'm also using 2.6, and have had this problem with 2.6 for a long time. The 2.4 reference was to point out that my problems started a looong time ago  :Very Happy: 

----------

