I'm getting much better compression when I make a compressed folder
(Windows XP) than I am using DeflateStream or GZipStream. I
thought these were the same algorithms used in PKZIP and for
compressing folders. Why such bad compression
DeflateStream: 3544Kb -> 1261Kb
GZipStream: 3544Kb -> 1261Kb
Windows XP: 3544Kb -> 804Kb
So how can I get the same compression ratio as Windows XP
Thanks,
Jeremy

System.IO.Compression not as good as compressed folder
Raudhah
njahamaca
using System.IO;
using System.IO.Compression;
Random random = new Random();
random.NextBytes(file);
DeflateStream deflate = new DeflateStream(
File.OpenWrite("data_zip.txt"),
deflate.Write(file, 0, file.Length);
deflate.Close();
GZipStream gzip = new GZipStream(
File.OpenWrite("data_gzip.txt"),
gzip.Write(file, 0, file.Length);
gzip.Close();
}
}
}
JimIT
Standalone compression utilities, like the PKZIP, perform file-based compression, which is subtly different from stream compression. When compressing a file, more analysis is possible (since all bits are known at the start), and memory allocation (mainly dictionary size) can be optimized. File-based utilities can even pick the most efficient algorithm based on analysis of the input bits.
Stream compression, on the other hand, has to 'take each bit as it comes' and is more restricted memory-wise, mostly because the working set of the stream compressor has to be predictable (and small, especially for general-purposes classes like DeflateStream).
So, the short answer to your question is, possibly a bit disappointing, "don't use stream compression if you want the smallest possible output". Fortunately, there are several third-party compression toolkits (just Google "ZIP toolkit .NET"), many of which offer much better-performing compression algorithms than Deflate, which helps even when compressing streams.
'//mdb
P.S. It's perfectly normal for the data size to increase after compression if the input is random or otherwise unsuitable for the algorithm used. File-based compression utilities will opt to just store the original file in this case: pure stream compressors can't do that for obvious reasons. Of course, you can always look at the original data size and the compressed stream size, and decide which one to persist (setting a flag somewhere to indicate the format, of course...) yourself.
TootPeep
-Jeremy
k-ichiro
Original: 372K
Deflate: 540K <--- Expanded!
Gzip: 540K
Windows XP: 353K
WinZip: 353K
On the off chance I'm doing something wrong, here's the code to save all three versions (file is a byte[]):
File.WriteAllBytes("c:\\data_txt.txt", file); // Save original file
// Save compressed version of file
DeflateStream deflate = new DeflateStream(File.OpenWrite("c:\\data_zip.txt"),
CompressionMode.Compress, false);
deflate.Write(file, 0, file.Length);
deflate.Close();
// Save compressed version of file
GZipStream gzip = new GZipStream(File.OpenWrite("c:\\data_gzip.txt"),
CompressionMode.Compress, false);
gzip.Write(file, 0, file.Length);
gzip.Close();
-Jeremy
Nels P. Olsen
Kiryl Hakhovich
I'm interested in compressing a folder to a file as well and would like to know if this is possible in .NET 3 and if the compression ratio issue has been fixed.
However, if XP's compression algorithm is better and seeing as how there doesn't (yet) seem to be a simple folder-compression command in the .NET API, couldn't we just call a Shell command and have XP itself compress a source folder to a zip file Mind you, I don't actually know what Shell command would do this for us, but I would think it's worth looking into at least.
Ideas
Gary W
mazen44
I don't buy it. The file was written with one function call, so it should compress the same as PkZip since it's supposed to be same algorithm. Even if it were broken in chunks of (say 256 bytes), the stream would only be expanded by 1% (about 3 bytes per chunk), and not a whopping 50%.
Re: System.IO.Compression broken.
-Jeremy
Najmunnisha
Bluesky_Jon
Pierre-Yves Troel
SqlShaun
PKzip and other file-based compression utilities can and will store files using wildly different algorithms or compression parameters. For example, here's a header dump (with the CRC and Attribute colums removed to save space) of a test ZIP file I just created using WinZIP:
Length Method Size Ratio Date Time Name
------ ------ ----- ----- ---- ---- ----
156000 DeflatN 81904 48% 12/17/2005 15:45 newcodes.txt
82026 Stored 82026 0% 12/18/2005 14:47 newcodes.zip
156000 DeflatF 83125 47% 12/17/2005 15:45 newcodes2.txt
Newcodes.txt and newcodes2.txt are the exact same plaintext file, both compressed using the Deflate algorithm. Still, there is a noticable difference in compression ratio between the default N(ormal) Deflate configuration and the F(ast) version I forced via the command line.
Results for a simple stream-based compressor (such as the one included in the .NET framework) will typically be in the 'DeflatF' range. This is a fact of life for stream compressors: to keep memory usage predictable and acceptable, they can't buffer too much of the stream, making look-ahead optimizations less effective.
You'll also see that the ZIP file I added was 'Stored' instead of compressed. In this case, WinZP noticed that the file expanded after running it through the compression algorithm, and decided to discard the compressed version and store the original file. Note that this is not a function of the Deflate (or any other) algorithm, but an explicit check the programmer of the ZIP utility put in place.
You can do the same with the System.IO.Compression streams: wrap them up in a class of your own, and persist either the plaintext or the compressed stream based on the final result. The fact that the (very basic) .NET stream compressor doesn't implement this functionality itself isn't a defect: you would need to do the exact same thing when using, say, zlib.
Of course you're free to petition Microsoft for more full-featured compression (even though just going the third-party route sounds a lot better to me...), but the behavior of the current System.IO.Compression classes has always been as expected for me (including being able to supply streams to other Deflate implementations...).
To prove there is a bug in DeflateStream, you would need to demonstrate significant differences in the output, for the same input file, of DeflateStream and another RFC1950 implementation, e.g. zlib. However, since such a bug would also cause major interoperability issues, and MS most likely used a RFC1950 reference implementation for DeflateStream anyway, I doubt there are any issues here.
'//mdb
Chris Rogeski
The problem does not exist in NetFX 3.0 because it didn't exist in version 2.0 either. The DeflateStream object applies the Deflate algorithm to data on a stream - the decision whether or not to use the output of the Deflate algorithm has to be made outside the DeflateStream object; for example, if you write code to read and write ZIP files (you can find an example of this at http://blogs.msdn.com/dotnetinterop/archive/2006/04/05/567402.aspx), it is up to you to determine what compression algorithm you will use for each stream inside the archive, including no compression at all (which the example code from the link above does NOT do). I repeat: DeflateStream is doing its job, and anyone complaining that it gives worse results than a ZIP utility doesn't understand the difference between the ZIP format itself, and the compression of data streams inside a ZIP file.
As it happens, the compression algorithm in DeflateStream does appear to have a bug, but it is something entirely different. Please don't make MS staff waste their time simply because you do not understand what you're talking about.