# Recover disk space

## kilburna

Hi

I got a server that has run out of disk space. I have deleted about 30G of files, but the df still shows 100% and I cannot start some services. After googling mention is made to use lsof util, but I cannot install lsof due to the disk space.

How may I go about finding out what is causing this. I tried rebooting the server after freeing so disk space but still no go. This is ext4 file system.

Thanks

Kilburn

```
/dev/md2       953239052 920260416         0 100% /

devtmpfs           10240         0     10240   0% /dev

tmpfs             814584       812    813772   1% /run

shm              4072912         0   4072912   0% /dev/shm

cgroup_root        10240         0     10240   0% /sys/fs/cgroup

/dev/md0          126867     34278     86039  29% /boot
```

[Moderator edit: added [code] tags to preserve output layout. -Hu]

----------

## 1clue

Can you wrap that text in code tags so it's easier to read?  Thanks.

----------

## 1clue

While there are tools that do better at this, I start at / and type du -hsx * and look where the size is >1g.

Focus on the filesystem that's full, in your case it's the / directory. The -x flag keeps the du command limited to one filesystem.

For /home/* remember that there are lots of hidden files and folders, you may need to pay better attention there.

----------

## Jaglover

Logfiles that are not rotated, mysql automatic backups, distfiles (if you have local portage), to name a few.

----------

## kilburna

My problem is that I have deleted about 50G of disk space since I got the out of disk space. It seems that no matter what I delete the df still shows 100%

```

Filesystem     1K-blocks      Used Available Use% Mounted on

/dev/md2       953239052 920260448         0 100% /

devtmpfs           10240         0     10240   0% /dev

tmpfs             814584       812    813772   1% /run

shm              4072912         0   4072912   0% /dev/shm

cgroup_root        10240         0     10240   0% /sys/fs/cgroup

/dev/md0          126867     34278     86039  29% /boot

/dev/sdc1       30013840    266144  29747696   1% /mnt/usbdrive

```

Here is /

```
/ # du -hsx *

7.4M    bin

32M     boot

0       dev

18M     etc

8.0K    home

0       lib

3.1M    lib32

17M     lib64

16K     lost+found

4.0K    media

28K     mnt

4.0K    opt

du: cannot access 'proc/31812/task/31812/fd/3': No such file or directory

du: cannot access 'proc/31812/task/31812/fdinfo/3': No such file or directory

du: cannot access 'proc/31812/fd/3': No such file or directory

du: cannot access 'proc/31812/fdinfo/3': No such file or directory

0       proc

59G     root

812K    run

6.6M    sbin

214M    stage3-amd64-20151001.tar.bz2

0       sys

12K     tmp

9.4G    usr

742G    var

```

----------

## Section_8

 *Quote:*   

> My problem is that I have deleted about 50G of disk space since I got the out of disk space. It seems that no matter what I delete the df still shows 100% 

 

That means something still has the deleted files open - the disk space isn't released until they are closed.

----------

## 1clue

Wow. You have almost a terabyte in /var.

Jaglover is right on the money. You have /var/log/* backups in there, portage temp space.

Look at tmpreaper to see about deleting old stuff.

cd /var and run the same command again: du -hsx * until you figure out where the space went.Last edited by 1clue on Thu Aug 03, 2017 7:05 pm; edited 1 time in total

----------

## kilburna

How do I find what deleted files are open if I do not have lsof util on the server.

----------

## 1clue

Did you run lsof as root?  Or as a user?  Edit: Never mind, it's in /usr/bin.

----------

## kilburna

I do not have lsof on the server, so I cannot run it. I also cannot install it as emerge fails with no disk space.

----------

## Jaglover

Did you check out portage tempfiles as 1clue suggested?

----------

## 1clue

Always focus on the most-full filesystem (in your case there's only one) and focus on the largest directory in that filesystem.

You're looking for big gains first, because right now you can't even install lsof. Always you're looking for biggest gains first.

So I would run du -hsx * in /var, and then again in whatever is biggest in that directory, and again in the biggest spot in that directory.

I used to use find to look for files greater than a certain amount, but that fails when you see a tmp directory with thousands of smaller files, like in /var/tmp, or with a ton of files in /var/log.

So while my approach is cumbersome and slower and very manual, it's better (for me at least) because each of these special-purpose directories might have something horribly wrong that won't be caught by something more automatic.

For me, I mount /tmp and /var/tmp in tmpfs, I have plenty of RAM and that keeps those files clean every time I reboot.

----------

## krinn

if you are out of inodes you'll get disk full too, even you have free space, did you check your inodes?

----------

## Hu

His regular df output looks exhausted to me.  He might also have an inode problem, but we know he has a block problem.

Given the size of /var and the assertion that this persisted across a reboot, my guess is there is a runaway process logging noise as fast as it can.  Every time he frees up blocks, the runaway consumes them to write more noise to its log.  Exploring inside /var, as suggested above, should lead the way.

----------

## C5ace

I would boot from Rescue System CD, mount the / (root partition) to /mnt/gentoo, try to find the files filling up /.  Also check contents of /lost+fount. Delete all unwanted files, unmount /mnt/gentoo  and run fsck -nfv on the / partition. If there are errors, run fsck -fv. See 'man fsck' for details.

mc (Midnight Commander) is your friend.

Problem may be caused by intermittent connector contacts, cables, memory contacts. Reseat all connectors and memory sticks. 

Then boot your system. If the problem continues, backup the / partition. Re-format the / partition. Restore the last known good backup. If this does not fix the problem. install Gentoo from scratch. Use Openrc with USE='-systemd' in /etc/portage/make.conf.  When everything works fine, bang  the case several times to reconfirm that there are no dodgy contacts. Check the log files. If no errors, install step by step the server applications.

----------

## ct85711

 *Quote:*   

> Problem may be caused by intermittent connector contacts, cables, memory contacts. Reseat all connectors and memory sticks.
> 
> Then boot your system. If the problem continues, backup the / partition. Re-format the / partition. Restore the last known good backup. If this does not fix the problem. install Gentoo from scratch. Use Openrc with USE='-systemd' in /etc/portage/make.conf. When everything works fine, bang the case several times to reconfirm that there are no dodgy contacts. Check the log files. If no errors, install step by step the server applications.

 

This isn't a symptom of a bad connection issue and reinstall is NOT a solution to fix something.  A reinstall is rarely ever the solution you should pick.

In this case, we know where he needs to look, in the /var.  2 prime spots to look in /var is going to be /var/log and maybe /var/tmp/portage.  The later is the default that portage compiles packages at.  Normally it should be empty (portage is correctly cleaning its self up) but it is also known to not do so.  So anything in the /var/tmp/portage folder can be straight out deleted  (assuming emerge is NOT running).

 *Quote:*   

> Given the size of /var and the assertion that this persisted across a reboot, my guess is there is a runaway process logging noise as fast as it can. Every time he frees up blocks, the runaway consumes them to write more noise to its log. Exploring inside /var, as suggested above, should lead the way.

 

As Hu mentioned, this is more likely a case of some logs spamming a lot, filling the drive.  So, more likely you will be able to spot the offending file easily enough by looking at ls -lh and look at file sizes...  All we need to do after that is look at the log file and see what is being spammed a lot in the log so we can possibly disable that service initially so the system is running, and we can then correct or redirect that service to log somewhere else (like sending the useless lines to /dev/null)...

----------

## Ant P.

Removing that leftover stage3 tarball would help in the meantime. Assuming /var/log/ isn't filling up faster than that...

----------

## 1clue

One more vote against a system restore or re-installation. This is a problem that can be solved.

Find the huge file(s)

If it's service-related then turn off that service. (e.g. if you have a ton of ssh entries then somebody's trying to brute-force your system. If you're on the console, turn off sshd.)

Delete the huge file(s)

Restart the service and try to determine why the file grew.

Fix that problem.

If your filesystem is still too full, repeat the large file discovery process and go after the largest file(s) again.

----------

## John R. Graham

 *Ant P. wrote:*   

> Removing that leftover stage3 tarball would help in the meantime. Assuming /var/log/ isn't filling up faster than that...

 Also, removing /usr/portage/distfiles/* would be a good temporary fix as, unless you've actively running an emerge, none of those files will be open.

- John

----------

