# tar, bzip2 multicore goodness

## kernelOfTruth

Hi everyone,

the basics for using tar are provided here:

http://www.shell-fu.org/lister.php?tag=tar

knowing these basics one can combine p7zip and tar to following command:

```
time (nice -20 tar -cp / -X /root/stage4.excl | 7z a -si -tbzip2 /bak/system/stage4-amd64_Final-11-030808.tbz2)
```

(this should create a stage4-tarball using bzip2-format with maximal compression and multiple cpu-cores, for you convenience it also shows the time it took to do so)

the command for extraction would be:

```
7z e -so -tbzip2 /bak/system/stage4-amd64_Final-11-030808.tbz2 | tar -xp -C /test/
```

if anything of the above is incorrect please post

I'm currently testing those commands & update this thread accordingly what I experience

update1:

now the sample commands' syntax should be correct

----------

## prizident

there is also a tool pbzip2, which also can handle multiple cores

----------

## kernelOfTruth

yes, the problem with that seems to be:

 *Quote:*   

> Decompressing non-pbzip2 Created Archives
> 
> pbzip2 can only decompress archives in parallel that have been compressed with pbzip2. For example, extracting linux-2.6.23.8.tar.bz2 as found on kernel.org with pbzip2 takes roughly twice as long on a dual core system when compared against bzip2. 

 

http://gentoo-wiki.com/HOWTO_Speed_up_decompression_with_pbzip2

----------

## kernelOfTruth

here the output of my first multi-core created stage4 tarball   :Razz: 

 *Quote:*   

> time (nice -20 tar -cp / -X /root/stage4.excl | 7z a -si -tbzip2 /bak/system/stage4-amd64_Final-11-030808.tbz2)
> 
> tar: Removing leading `/' from member names
> 
> 7-Zip  4.58 beta  Copyright (c) 1999-2008 Igor Pavlov  2008-05-05
> ...

 

(this is 51 minutes instead of 80 or more minutes )

----------

## fangorn

This is working great. Thank you.

For convenience I packed this into two scripts tbz2 and utbz2. If someone is interested, here they are. 

```
#!/bin/bash

if [ $# -le 1 ] ; then 

   echo "Usage: $0 <archive_file> source1 [source2 [...]]"

   exit; 

fi

dest=$1

shift 

nice -20 tar -cp $@ | nice -20 7z a -si -tbzip2 $dest 
```

```
#!/bin/bash

dest=""

if [ $# -lt 1 ] ; then 

   echo "Usage: $0 <archive_file> [destination_directory]"

   exit; 

fi

if [ ! -f $1 ] ; then

   echo "Usage: $0 <archive_file> [destination_directory] "

   exit; 

fi

   

if [ ! -z $2 ] ; then

   if [ -d $2 ] ; then

      dest="-C "$2

   else 

      echo "Directory $2 does not exist. Do you want to create it (y/n)"

      read a

      if [ $a = "y" ] || [ $a = "Y" ] ; then

         mkdir -p $2

         dest="-C "$2

      else

         exit;

      fi    

   fi

fi

7z e -so -tbzip2 $1 | tar -xp $dest
```

----------

## Zucca

This might make compressing even more effective:

```
time (nice -20 tar -cp / -X /root/stage4.excl | 7z a -si -tbzip2 -md=32m -mx=9 -mpass=10 -mmt=5 /bak/system/stage4.tbz2)
```

I haven't tested much it.

It's slower, yes. On my test 7min --> 12min difference on a test archive.

----------

## shentino

What if each bzip2 block were forked into its own thread for decompression, and then all the thawed blocks were simply reassembled in the correct order?

----------

## mv

GNU tar has the option --use-compress-program. So you could just write a script which calls "exec 7z" with appropriate parameters and use that option. I can imagine (depending on the implementation in GNU tar which I did not check) that this could be slightly faster than using the shell for piping.

----------

## mattst88

 *prizident wrote:*   

> there is also a tool pbzip2, which also can handle multiple cores

 

Please use lbzip2 instead.

 *kernelOfTruth wrote:*   

> yes, the problem with that seems to be:
> 
>  *Quote:*   Decompressing non-pbzip2 Created Archives
> 
> pbzip2 can only decompress archives in parallel that have been compressed with pbzip2. For example, extracting linux-2.6.23.8.tar.bz2 as found on kernel.org with pbzip2 takes roughly twice as long on a dual core system when compared against bzip2.  
> ...

 

lbzip2 does not have this limitation. Use it instead.

----------

## Ant P.

Anyone tried app-arch/lrzip on these stage4 files? It usually gives 1GB/min at maximum settings for me.

----------

## kernelOfTruth

 *mattst88 wrote:*   

>  *prizident wrote:*   there is also a tool pbzip2, which also can handle multiple cores 
> 
> Please use lbzip2 instead.
> 
>  *kernelOfTruth wrote:*   yes, the problem with that seems to be:
> ...

 

awesome - thanks !  :Smile: 

now I only need the liveCD creators to include it ^^

for my PC it should be no problem since I use an alternative emergency system but on my laptop there's not enough space on the harddrive to do so ...

some more info on the *zip compressors:

 gziptest.sh part 2: multi-threaded compression benchmarks 

----------

## John R. Graham

Just delivered a .tar.bz packaged with pbzip2 to a far east factory partner that they could not unpack with WinRAR. It appears I had installed pbzip2 since I had last delivered anything to the factory. Another strike against pbzip2.

- John

----------

