# [Solved] Problems with Seagate D4 / ISCSI

## therealjrd

Hi all.

I acquired a Seagate D4 NAS box.  I've configured it as an ISCSI device.

It connects fine, created an ext4 FS on it, superficial tests work great.  iozone works great.

But when I try to rsync my main server FS to it, it runs for a while, then complains of being unable to write to a readonly file system.  fsck tells me about all sorts of carnage.

Looking for suggestions on how to debug.  Thanks in advance . . .Last edited by therealjrd on Wed Dec 28, 2016 6:00 pm; edited 1 time in total

----------

## therealjrd

Bump

I've rebuilt open-iscsi with USE debug.  I've turned on CRC32 checking for both header and data digests.  Those measures seem to help, in that the time-to-failure is longer, but it still fails after a while.

I *think* this is telling me that one or the other NIC is flaky.  Anybody out there have similar experience?  Ideas on how to debug further before I start throwing hardware at it?

----------

## therealjrd

I've done a few more experiments on this.

When I mount -o sync, everything works perfectly:  I can run rsync for days with zero errors.  That's good, because the performance is terrible  :Sad:   It takes days to sync a TB.

I've also turned on CRC32C checking data and header digests, and told the device to do the same.  No discernable difference.  So I no longer think I'm looking at NIC problems.

Mounting with different combinations of options seems to make some difference, but not definitive.  data=ordered,commit=1,debug,barrier=1 seems to work best, in that it survives longest before starting to detect errors, but I've found no combination which makes it reliable.

Googling a bit for similar setups doesn't turn up much.  Does anyone have pointers to other deployments using iscsi to talk to one of these Seagate devices?

Another option, of course, is to stop trying to use the seagate device as a block device, and turn on its internal NFS server.

Any hints appreciate.

----------

## therealjrd

Well, ok, FTR, I've sort of figured this out.

It seems that some devices, including the Seagate unit I have, do not do a good job of flow control.  It's possible for the initiator to overload them, after which they seem to garble or drop requests.  With a file system, that manifests as FS corruption.

Modern kernels have a workaround, but you have to know where to look for it.  There's a long thread about this topic here:  https://bugzilla.kernel.org/show_bug.cgi?id=93581

What I did to "fix" this:

1.  Use parted to partition the device.  This allows for optimal sector alignment, and more efficient IO.  I followed the instructions here:  https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Disks#Default:_Using_parted_to_partition_the_disk

2.  Per the above bug report, set the value of /sys/block/sdd/queue/max_sectors_kb much lower.  4.4.26 kernel is 32K.  I'm still trying different values, but it looks like 256 is enough to keep the device maxed out.

3.  Use mount options  -o commit=1,barrier=1,block_validity   It's not clear that these make much difference; I started using combinations of them before I discovered max_sectors_kb.  Leaving them on does seem to smooth out the IO performance, as measured by the self-monitoring software in the seagate.

By taking these measures, I've been able to write many TB onto the device with no corruption and no errors in the syslog other than an occasional disconnect/reconnect.  I plan to do some more experiments, run some iozone tests etc, before I start trusting this device with real data.

In case anybody's trying to interface to an iscsi device (or other block device) and seeing wierd problems, I recommend reading over that bug report and reducing the value of max_sectors_kb.

----------

## gwong

Had a very similar issue. After reading this, now it is solved. Thanks.

----------

