# 40TB Array parted into 1 partition - after format only 2TB

## ramsesii

Hello!

I've got an 6x8TB Array from an P420 Controller (HP). The Array is recognized and online as /dev/sda.

I made a GPT partitiontable and 1 large Partition.

# (parted) p

Modell: HP LOGICAL VOLUME (scsi)

Festplatte  /dev/sda:  40,0TB

Sektorgrösse (logisch/physisch): 512B/4096B

Partitionstabelle: gpt

# parted -a optimal /dev/sda mkpart RAID 0% 100%

Nummer  Anfang  Ende    Groesse   Dateisystem  Name     Flags

 1            1311kB  40,0TB  40,0TB               RAID

align check is perfekt!

# (parted) align-check optimal 1

1 ausgerichtet

If i now Format the Partition with ext4 or jfs or whatever i get a 2TB /dev/sda1. (testmount to /mnt)

# mkfs.ext4 /dev/sda1

/dev/sda1       2,0T     81M  1,9T    1% /mnt

AND after umount and reenter parted the Partition table is UNKOWN

# parted /dev/sda

(parted) p

Fehler: /dev/sda: unbekannte Partitionstabelle

Modell: HP LOGICAL VOLUME (scsi)

Festplatte  /dev/sda:  40,0TB

Sektorgr▒▒e (logisch/physisch): 512B/4096B

Partitionstabelle: unknown

trying to make a "new" one with

# parted -a optimal /dev/sda mkpart RAID 0% 100% results in

Error: /dev/sda: unknown partitiontable

(parted) rm 1

doesn't work neither!

in fidks i can see the Partition and can delete it.

Festplatte /dev/sda: 36,4 TiB, 40007647977472 Bytes, 78139937456 Sektoren

Einheiten: Sektoren von 1 * 512 = 512 Bytes

Sektorgr▒▒e (logisch/physikalisch): 512 Bytes / 4096 Bytes

E/A-Gr▒▒e (minimal/optimal): 262144 Bytes / 1310720 Bytes

Festplattenbezeichnungstyp: gpt

Festplattenbezeichner: 34B9D75E-CB37-494E-826C-C251D22D22F5

Ger▒t      Anfang        Ende    Sektoren Gr▒▒e Typ

/dev/sda1    2560 78139937279 78139934720 36,4T Linux-Dateisystem

-----------

If i DO NOT Partition the volume and Format /dev/sda directly with ext4 or jfs i get a mountable  37TB array!

But i get also a MESSAGE i cannot understand due to the fact IT IS GPT - but there is the error anywhere...

# mkfs.ext4 /dev/sda

/dev/sda contains `DOS/MBR boot sector; partition 1

After format has finished - takes lot more time to Format - i mount it to /mnt and get ...

# /dev/sda         37T     24K   35T    1% /mnt

After umount i do # parted /dev/sda and look at the Partition table and see that:

(parted) p

Modell: HP LOGICAL VOLUME (scsi)

Festplatte  /dev/sda:  40,0TB

Sektorgr▒▒e (logisch/physisch): 512B/4096B

Partitionstabelle: loop <---

Disk-Flags:

Nummer  Anfang  Ende    Gr▒▒e   Dateisystem  Flags

 1      0,00B   40,0TB  40,0TB  ext4

I think i read that it is ok to format a disk not a partition, but it suffers in performance a little.

But i want to have it clean as it used to be!

thank you very much for you help!

RAM

Gentoo fresh install

4.4.39-gentoo #4 SMP x86_64 Intel(R)

----------

## Roman_Gruber

How is this device presented to the OS?

 *Quote:*   

> I've got an 6x8TB Array from an P420 Controller (HP). 

 

is it presented as 6 discs?

or is there soem firmware to present it as only one device?

I think gpt is quite new to enable very very big partition tables.

Is htis a plugin card? or is it some external device over some port?

How does the bios sees this device? 

--

Assume Bios / kernel / userspace is fine. =>

You may fire up gparted live-cd / live-usb for example and create that gpt partition table + file system. (most easy way of doing it!)

----------

## ramsesii

I'ts coming in after boot as block device /dev/sda with 40TB

# fdisk -l /dev/sda

Harddisk /dev/sda: 36,4 TiB, 40007647977472 Bytes, 78139937456 Sektoren

# cat /sys/block/sda/device/model

LOGICAL VOLUME

# cat /sys/block/sda/device/path_info

[0:1:0:0]    Direct-Access     Active

it's an HP DL380 Server with a P420 12disk Controller.

----------

## Roman_Gruber

Only thing which comes to my mind

Did you saved the partiton table?

https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Disks#GPT

 *Quote:*   

> Saving the partition layout
> 
> To save the partition layout and exit fdisk, type w.
> 
> Command (m for help):w
> ...

 

Parted needs you to save the partitin table. 

--

You may try gparted. Thats very easy and user friendly. or

use the handbook https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Disks#GPT

Which should be also fine, when it is as you said, presented as a single disc, and the hardware is working fine according to the specs. No hardware defect. no loose cables, no lack of power from the power supply or other effects which may effect operations.

--

It is good practise to create a partition table, so its obvious there is data on it.

When everyone knows there is data on it, you do not really need a partion table and can instantly use the device => e.g. /dev/sda

--

 *Quote:*   

>  Error: /dev/sda: unknown partitiontable 

 

Have you checked the kernel for those special sections. File system support / partition table support / anything else which may be realted.

--

https://wiki.archlinux.org/index.php/partitioning#GPT_Kernel_Support

 *Quote:*   

> GPT Kernel Support
> 
> The CONFIG_EFI_PARTITION option in the kernel config enables GPT support in the kernel (despite the name, EFI PARTITION). This option must be built in the kernel and not compiled as a loadable module. This option is required even if GPT disks are used only for data storage and not for booting. This option is enabled by default in Arch's linux and linux-lts kernels in the [core] repo. In case of a custom kernel, enable this option by doing CONFIG_EFI_PARTITION=y. 

 

--

https://www.cyberciti.biz/tips/fdisk-unable-to-create-partition-greater-2tb.html

 *Quote:*   

> rankly speaking, you cannot create a Linux partition larger than 2 TB using the fdisk command. The fdisk won’t create partitions larger than 2 TB

 

Which explains your 2TB partition.Last edited by Roman_Gruber on Thu Feb 02, 2017 9:20 pm; edited 1 time in total

----------

## ramsesii

Hallo Roman!  :Smile: 

gparted doesn't provide a possibility to save GPT table.

the "w" comes from fdisk.

i tried other tools too - always the same ... :/ (cfdisk p.ex)

gparted will not work - i have no gui!

----------

## Roman_Gruber

 *Quote:*   

> 
> 
> https://www.cyberciti.biz/tips/fdisk-unable-to-create-partition-greater-2tb.html
> 
> Quote:	
> ...

 

Do not use fdisk. Use parted for example!

Sorry took me a while to recognize where the issue was.

----------

## ramsesii

i do NOT use fdisk - i used parted!

----------

## John R. Graham

The issue is the MSDOS partition table. It has a fundamental limit of 2^32 sectors. Sectors (in this context) are 512 bytes and 512 * 2^32 equals, coincidentally enough, 2TiB. Ask your partitioning tool to create a GPT (GUID Partition Table).

- John

----------

## szatox

Hint on creating GPT partitions: gdisk

Yes, it is command-line tool.

----------

## Fitzcarraldo

Or use the GParted GUI on SystemRescueCd. That's how I partition GPT HDDs (and MBR HDDs, come to that).

----------

## frostschutz

Check if you have support for GPT partition table in your kernel.

```

$ zgrep EFI /proc/config.gz 

CONFIG_EFI_PARTITION=y

```

Without it you'd have exactly that effect, after creating a GPT partition, the OS would only see the fake msdos partition, which is 2T.

If that's not it then you took a wrong step somewhere...

You should run some tests... with filesystems this size, fsck can be a memory hog.

----------

## knob-creek

If you want the file system to use the whole of the array, why do you install a partitioning scheme at all?  Simply write the FS to the raw device:

```

# dd if=/dev/zero of=/dev/sda bs=1024k count=100 # erases the first 100 MB, esp all of the partitioning artifacts

# mkfs.ext4 /dev/sda

```

Of course, you don't have any partition table afterwards, so parted and friend won't read out anything useful.

And something completely different: it it well possible, that a raid 5 of that size does not work as expected: to recreate a failed disk (that's what raid 5 is for), all 5 other disks have to be read entirely.  That's 40 TB * 8 > 3 * 10¹⁴ bits, which is a non negligible part of the 10¹⁵ bits modern disks promise to read without uncorrectable errors.  Meaning, there is a risk of being unable to recover a failed array.  Maybe you would be better off using the disks unraided with a modern reduntant multi disk aware file system like zfs on them.

----------

## frostschutz

 *knob-creek wrote:*   

> If you want the file system to use the whole of the array, why do you install a partitioning scheme at all?

 

You can do without partition table, but you should know what you're doing (and not muck about with parted, testdisk, and various OS installers later). It's easier to accidentally damage filesystem on raw device, than partition. My recommendation is to always use partition table. Of course this is a matter of taste as well...

 *knob-creek wrote:*   

> 
> 
> ```
> 
> # dd if=/dev/zero of=/dev/sda bs=1024k count=100 # erases the first 100 MB, esp all of the partitioning artifacts
> ...

 

GPT is both at start and end of disk so it's kind-of still there even after your dd...

So you could also dd the last 100MB or if you want it to be more surgically accurate, use wipefs -a. It kills only few bytes (the magic signatures needed to recognize filesystem/partition scheme).

Run it on the partitions first to wipe partition content signatures, and then on the device itself to get rid of partition table schemes.

 *knob-creek wrote:*   

> That's 40 TB * 8 > 3 * 10¹⁴ bits, which is a non negligible part of the 10¹⁵ bits modern disks promise to read without uncorrectable errors.

 

This bit failure rate has zero relevance in practice. That's just not how it works.

If you get errors during rebuilds, it's more likely because you never ran SMART selftests / RAID checks, and rebuild just happens to be a full read test, so you stumble over old undetected errors. That or you deliberately ignore HDD problems (keep disks in the array despite reallocated/pending sectors). Whatever promises RAID makes regarding redundancy, is under the assumption that disks are fully operational, if you don't replace them with the first bad sector all promises are off...

RAID needs disks, controllers, cables, (... all hardware involved), that works 100%. Something goes bad in any way, you replace it.

RAID needs monitoring, instant mail notification, regular tests to verify things are still working, otherwise disk errors can go undetected for years (until you rebuild, too late).

RAID needs a budget that allows you to replace things. If you've overdrawn your budget to buy those super expensive enterprise RAID disks/controllers and can't afford replacements, your RAID is dead. Expensive disks go bad all the same.

RAID does not replace backups, no matter how much effort you make to prevent failures.  :Wink: 

----------

## knob-creek

 *frostschutz wrote:*   

> This bit failure rate has zero relevance in practice. That's just not how it works.

 

You might want to read this article.

Yes, the calculations contain some simplifications and the drives might do better than said by the manufacturer, but there is a statistical probability of being unable to read a sector without warning (esp. SMART warning).

The main problem i see regarding the use of an unaware file system on top of a raid array is the separation of abstractions.  The raid tries to simulate an error free disk, which still may fail.  In that case all you know is that a certain sector of your virtual disk is corrupted -- if the hardware solution mentioned above allows you to take any benefit of if at all.

If redundancy and multi disk awareness are built into the file system, in case of a failed recovery the file system is able to tell you which file(s) are affected.

----------

## frostschutz

 *knob-creek wrote:*   

> 
> 
> You might want to read this article.
> 
> 

 

It's bullshit. Fake news.  :Laughing:  Anything based on URE 10^14/15 is. That's just not how it works.

 *knob-creek wrote:*   

> 
> 
> If redundancy and multi disk awareness are built into the file system, in case of a failed recovery the file system is able to tell you which file(s) are affected.
> 
> 

 

That sounds very fine in theory, in practice even simple filesystems are riddled with bugs (it takes a looong time for them to get stable) and redundancy, multidisk, checksumming, etc. adds yet another ton of complexity on top of it. So I'm not sure that's the way to go.

At the same time people resort to FUD to advertize for those shiny new filesystems. I daresay more people lost their data to ZFS/btrfs than they would have ever lost to the flippin-bits myth which just doesn't happen because, you know, the various storage media have their own checksums already... there is not much room for /silent/ bit-flips there.

Most corrupted files I've seen were corrupted by software (all it takes is one buggy image viewer to destroy your image collection) and that kind of corruption no filesystem will protect you from, they'll happily take the corrupted files and checksum them.

Anyway, getting far off topic here. OP never reported back whether it was just a case of missing kernel option...

----------

## NeddySeagoon

knob-creek,

There is a wonderful Scottish word to describe that article ... "guff".

The SMART health check is worthless.  By the time it says your drive has died, you already know it.

You need to run a check or repair on your raid sets and watch the Pending Sector and Relocated Sector counts.

Any drive with a non zero Pending Sector count is scrap.  Hopefully its still under warranty.

Any drive with an increase in the Relocated Sector count is not happy.

That's true of drives in raid sets or not.

I'm a fan of splitting the complexity into different specialist modules as far as possible.

Hence mdadm for raid, its been around a long time, lvm for logical volumes, its mature too and a well tested filesystem on top.

When you put everything into one basket, its a lot to get right all in one go. 

Being old and cynical, experience has shown that it doesn't happen.  

Getting back to your link, in theory, there is no difference between theory and practice. In practice, there is.

----------

## knob-creek

 *frostschutz wrote:*   

> At the same time people resort to FUD to advertize for those shiny new filesystems.

 

ZFS was introduced 2006, ext4 was introduced 2008.  Shiny new?

 *frostschutz wrote:*   

> I daresay more people lost their data to ZFS/btrfs than they would have ever lost to the flippin-bits myth which just doesn't happen because, you know, the various storage media have their own checksums already... there is not much room for /silent/ bit-flips there.

 

You might contemplate about the meaning of uncorrectable in URE.  Of course, it is not a single (correctable) bit flip.  Nobody claimed it would go unnoticed.  It would break your raid 5 recovery.

And some food for thought: why do you think people invented (and then started using) raid 6 if nothing can go wrong with raid 5?

 *NeddySeagoon wrote:*   

> The SMART health check is worthless. By the time it says your drive has died, you already know it.

 

I would never disagree on that.  Would you agree, that this statement questions frostschutzes "just keep everything perfect" strategy?

----------

## NeddySeagoon

 *knob-creek wrote:*   

> 
> 
>  *NeddySeagoon wrote:*   The SMART health check is worthless. By the time it says your drive has died, you already know it. 
> 
> I would never disagree on that.  Would you agree, that this statement questions frostschutzes "just keep everything perfect" strategy?

 

Not at all.  The SMART data itself is useful, the summary PASS/FAIL report is useless.

Raid6 has been around for a long time for big raid sets when drive reliability was lower that it is now. The URE was not the problem then.

Think raid sets with >28 elements, which was one of the features that made the move from raid metadata 0.9 to 1.2 compulsory.

Here is some background reading

----------

## s4e8

I vote it. If your kernel has no GPTsupport, you see the 2T faked protective partition. Formating this partition would destroy the GPT headers,make the partiton table unknown.

 *frostschutz wrote:*   

> Check if you have support for GPT partition table in your kernel.
> 
> ```
> 
> $ zgrep EFI /proc/config.gz 
> ...

 

----------

## Roman_Gruber

 *frostschutz wrote:*   

> 
> 
> At the same time people resort to FUD to advertize for those shiny new filesystems. I daresay more people lost their data to ZFS/btrfs than they would have ever lost to the flippin-bits myth which just doesn't happen because, you know, the various storage media have their own checksums already... there is not much room for /silent/ bit-flips there.
> 
> Most corrupted files I've seen were corrupted by software (all it takes is one buggy image viewer to destroy your image collection) and that kind of corruption no filesystem will protect you from, they'll happily take the corrupted files and checksum them..

 

+1 In my days it was called reiserfs @ Suse 6.2

 *knob-creek wrote:*   

>  *frostschutz wrote:*   At the same time people resort to FUD to advertize for those shiny new filesystems. 
> 
> ZFS was introduced 2006, ext4 was introduced 2008.  Shiny new?
> 
> 

 

The point is, ext4 is kinda defacto standard on many boxes.

There went many patches in the kernel for ext4.

Exotic filesystems just claim to be faster better whatever. I used for a hole month xfs recently, to saw what a "crap" it is.

I'd rather use a filesystem which is kinda used on far more boxes in far more scenarios as those fancy FS which may corrupt my data, data loss whatever.

I do know how to deal with data corruption at ext4. No idea what to do with fancy pants.

It has a reason why Mickysoft still uses ntfs. When File system would be such easy to implement and test, why did Mickysoft still sticks to some Windows NT 3.x old file system (just a guess on how old it is).

Even fat is still around as we all know, UEFI now needs fat  :Sad:  So UEFI brought back fat in my kernels. They could have used something else but no they stick to some antique limited fs from Mickysoft MSDOS area.

----------

