Multi-threaded gzip

By Babak Farrokhi, April 14, 2009 11:52 am

The traditional (yet very popular) gzip is a single-threaded application from the single-processor/single-core hardware era. Its just fine if you are compressing a few files occasionally, but it become a great pain when you are compressing 32,000 files on an 8-processor server and you suddenly figure out that you are using only 1/8 of your total processor power. Which means you should wait 8 times longer than if you could use all processing power on your machine. I encountered such case in which I should wait about 40 minutes to compress hundreds of gigabytes of a few thousand files, using traditional gzip, while I had one processor doing the whole job and 7 other processors were sitting idle.

So I thought there should be a way to speed-up the process. The most simple method I could use was to open up multiple terminal windows and run parallel copies of gzip, each of them to compress a specific set of files. While this method worked for me, but I was wondering why the gzip itself doesn’t support multi-threading.

The solution: pigz

I came across pigz after searching the internet for a multi-threaded gzip replacement. pigz is a drop-in replacement for gzip that supports parallel compression/decompression when multiple files are involved.

pigz-runningFigure 1: Running “systat -iostat 1” on a FreeBSD 7.2 machine running pigz

Using pigz, I could exploit more than 70% of my processing power. pigz also maintains compatibility with standard gzip command line parameter and supports all switches while adding “-p” command to specify maximum number of compression threads.

7 Responses to “Multi-threaded gzip”

  1. Thanks. This was very useful.

  2. Hamid says:

    little but important point, thanx

  3. Parham says:

    Good job…

  4. Brian says:

    Is can this support mpi or is it only shared memory?

  5. No mpi as far as I know

  6. IT_Architect says:

    I had plenty of time on my hands while I was compressing a bunch of large files. It seemed like my Nehalem was doing about as good as Core 2 Duo. My curiosity got me checking the monitor wondering if gzip was smart enough to thread. Apparently not. The monitor showed I was using one core. I Googled and happened on your post. I appreciate you posting this. I will have to check it out.

  7. MM says:

    Thanks, really good tool – I often compress 3-5GB files on 8-core machine and this tool speeds it up a lot! :)

Leave a Reply

Panorama Theme by Themocracy