The traditional (yet very popular) gzip is a single-threaded application from the single-processor/single-core hardware era. Its just fine if you are compressing a few files occasionally, but it become a great pain when you are compressing 32,000 files on an 8-processor server and you suddenly figure out that you are using only 1/8 of your total processor power. Which means you should wait 8 times longer than if you could use all processing power on your machine. I encountered such case in which I should wait about 40 minutes to compress hundreds of gigabytes of a few thousand files, using traditional gzip, while I had one processor doing the whole job and 7 other processors were sitting idle.

So I thought there should be a way to speed-up the process. The most simple method I could use was to open up multiple terminal windows and run parallel copies of gzip, each of them to compress a specific set of files. While this method worked for me, but I was wondering why the gzip itself doesn’t support multi-threading.

The solution: pigz

I came across pigz after searching the internet for a multi-threaded gzip replacement. pigz is a drop-in replacement for gzip that supports parallel compression/decompression when multiple files are involved.

pigz-runningFigure 1: Running “systat -iostat 1” on a FreeBSD 7.2 machine running pigz

Using pigz, I could exploit more than 70% of my processing power. pigz also maintains compatibility with standard gzip command line parameter and supports all switches while adding “-p” command to specify maximum number of compression threads.

Tagged with:
 

15 Responses to Multi-threaded gzip

  1. Thanks. This was very useful.

  2. Hamid says:

    little but important point, thanx

  3. Parham says:

    Good job…

  4. Brian says:

    Is can this support mpi or is it only shared memory?

  5. No mpi as far as I know

  6. IT_Architect says:

    I had plenty of time on my hands while I was compressing a bunch of large files. It seemed like my Nehalem was doing about as good as Core 2 Duo. My curiosity got me checking the monitor wondering if gzip was smart enough to thread. Apparently not. The monitor showed I was using one core. I Googled and happened on your post. I appreciate you posting this. I will have to check it out.

  7. MM says:

    Thanks, really good tool – I often compress 3-5GB files on 8-core machine and this tool speeds it up a lot! :)

  8. John M says:

    I’m curious if pigz will utilize mutliple cores when decompressing archives that were compressed using gzip(single core)…

  9. Its what exactly “pigz -d” or “unpigz” does.

  10. John M says:

    Ok great, just want to be sure. I found another archive tool that was multithreaded but due to the nature of the way the archives were made it would only extract archives made via single thread in a single thread mode.

    Here it is:
    http://www.linux.com/archive/feature/126412

    “One caveat with pbunzip2 is that it will only use multiple cores if the bzip2 compressed file was created with pbzip2″

    So pigz is a drop in replacement for tar? I’m an above average novice with Linux. Any info anywhere to help me get it installed on Centos 5 x64 ?

  11. John M says:

    excuse me… I meant gzip…not tar…

  12. John M says:

    Speaking of tar are you aware of any parallel implementation of tar ?

  13. For CentOS 5.x x64 I believe you may use the RPM from here: http://rpmfind.net//linux/RPM/epel/5/x86_64/pigz-2.1.6-1.el5.x86_64.html

    As for tar, I am not sure if any parallel implementation exists.

  14. I am afraid it does not.