Multi-threaded gzip
The traditional (yet very popular) gzip is a single-threaded application from the single-processor/single-core hardware era. Its just fine if you are compressing a few files occasionally, but it become a great pain when you are compressing 32,000 files on an 8-processor server and you suddenly figure out that you are using only 1/8 of your total processor power. Which means you should wait 8 times longer than if you could use all processing power on your machine. I encountered such case in which I should wait about 40 minutes to compress hundreds of gigabytes of a few thousand files, using traditional gzip, while I had one processor doing the whole job and 7 other processors were sitting idle.
So I thought there should be a way to speed-up the process. The most simple method I could use was to open up multiple terminal windows and run parallel copies of gzip, each of them to compress a specific set of files. While this method worked for me, but I was wondering why the gzip itself doesn’t support multi-threading.
The solution: pigz
I came across pigz after searching the internet for a multi-threaded gzip replacement. pigz is a drop-in replacement for gzip that supports parallel compression/decompression when multiple files are involved.
Figure 1: Running “systat -iostat 1” on a FreeBSD 7.2 machine running pigz
Using pigz, I could exploit more than 70% of my processing power. pigz also maintains compatibility with standard gzip command line parameter and supports all switches while adding “-p” command to specify maximum number of compression threads.
15 Responses to Multi-threaded gzip
Archives
- April 2012
- July 2011
- December 2010
- September 2010
- May 2009
- April 2009
- February 2009
- September 2008
- April 2008
- February 2008
- January 2008
- November 2007
- October 2007
- August 2007
- July 2007
- May 2007
- February 2007
- December 2006
- October 2006
- September 2006
- April 2006
- March 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
- January 2005
- December 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
- December 2003
- November 2003
- October 2003
- September 2003
- August 2003
- July 2003
- June 2003
- May 2003
- April 2003
- March 2003
- February 2003




Thanks. This was very useful.
little but important point, thanx
Good job…
Is can this support mpi or is it only shared memory?
No mpi as far as I know
I had plenty of time on my hands while I was compressing a bunch of large files. It seemed like my Nehalem was doing about as good as Core 2 Duo. My curiosity got me checking the monitor wondering if gzip was smart enough to thread. Apparently not. The monitor showed I was using one core. I Googled and happened on your post. I appreciate you posting this. I will have to check it out.
Thanks, really good tool – I often compress 3-5GB files on 8-core machine and this tool speeds it up a lot!
I’m curious if pigz will utilize mutliple cores when decompressing archives that were compressed using gzip(single core)…
Its what exactly “pigz -d” or “unpigz” does.
Ok great, just want to be sure. I found another archive tool that was multithreaded but due to the nature of the way the archives were made it would only extract archives made via single thread in a single thread mode.
Here it is:
http://www.linux.com/archive/feature/126412
“One caveat with pbunzip2 is that it will only use multiple cores if the bzip2 compressed file was created with pbzip2″
So pigz is a drop in replacement for tar? I’m an above average novice with Linux. Any info anywhere to help me get it installed on Centos 5 x64 ?
excuse me… I meant gzip…not tar…
Speaking of tar are you aware of any parallel implementation of tar ?
Well, pigz will do that.
For CentOS 5.x x64 I believe you may use the RPM from here: http://rpmfind.net//linux/RPM/epel/5/x86_64/pigz-2.1.6-1.el5.x86_64.html
As for tar, I am not sure if any parallel implementation exists.
I am afraid it does not.