Class ParallelDeflateOutputStream
A class for compressing streams using the Deflate algorithm with multiple threads.
Inheritance
Implements
Inherited Members
Namespace: OfficeOpenXml.Packaging.Ionic.Zlib
Assembly: EPPlus.dll
Syntax
public class ParallelDeflateOutputStream : Stream, IAsyncDisposable, IDisposable
Remarks
This class performs DEFLATE compression through writing. For more information on the Deflate algorithm, see IETF RFC 1951, "DEFLATE Compressed Data Format Specification version 1.3."
This class is similar to DeflateStream, except that this class is for compression only, and this implementation uses an approach that employs multiple worker threads to perform the DEFLATE. On a multi-cpu or multi-core computer, the performance of this class can be significantly higher than the single-threaded DeflateStream, particularly for larger streams. How large? Anything over 10mb is a good candidate for parallel compression.
The tradeoff is that this class uses more memory and more CPU than the vanilla DeflateStream, and also is less efficient as a compressor. For large files the size of the compressed data stream can be less than 1% larger than the size of a compressed data stream from the vanialla DeflateStream. For smaller files the difference can be larger. The difference will also be larger if you set the BufferSize to be lower than the default value. Your mileage may vary. Finally, for small files, the ParallelDeflateOutputStream can be much slower than the vanilla DeflateStream, because of the overhead associated to using the thread pool.
Constructors
ParallelDeflateOutputStream(Stream)
Create a ParallelDeflateOutputStream.
Declaration
public ParallelDeflateOutputStream(Stream stream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | stream | The stream to which compressed data will be written. |
Remarks
This stream compresses data written into it via the DEFLATE algorithm (see RFC 1951), and writes out the compressed byte stream.
The instance will use the default compression level, the default buffer sizes and the default number of threads and buffers per thread.
This class is similar to DeflateStream, except that this implementation uses an approach that employs multiple worker threads to perform the DEFLATE. On a multi-cpu or multi-core computer, the performance of this class can be significantly higher than the single-threaded DeflateStream, particularly for larger streams. How large? Anything over 10mb is a good candidate for parallel compression.
Examples
This example shows how to use a ParallelDeflateOutputStream to compress data. It reads a file, compresses it, and writes the compressed data to a second, output file.
byte[] buffer = new byte[WORKING_BUFFER_SIZE];
int n= -1;
String outputFile = fileToCompress + ".compressed";
using (System.IO.Stream input = System.IO.File.OpenRead(fileToCompress))
{
using (var raw = System.IO.File.Create(outputFile))
{
using (Stream compressor = new ParallelDeflateOutputStream(raw))
{
while ((n= input.Read(buffer, 0, buffer.Length)) != 0)
{
compressor.Write(buffer, 0, n);
}
}
}
}
Dim buffer As Byte() = New Byte(4096) {}
Dim n As Integer = -1
Dim outputFile As String = (fileToCompress & ".compressed")
Using input As Stream = File.OpenRead(fileToCompress)
Using raw As FileStream = File.Create(outputFile)
Using compressor As Stream = New ParallelDeflateOutputStream(raw)
Do While (n <> 0)
If (n > 0) Then
compressor.Write(buffer, 0, n)
End If
n = input.Read(buffer, 0, buffer.Length)
Loop
End Using
End Using
End Using
ParallelDeflateOutputStream(Stream, CompressionLevel)
Create a ParallelDeflateOutputStream using the specified CompressionLevel.
Declaration
public ParallelDeflateOutputStream(Stream stream, CompressionLevel level)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | stream | The stream to which compressed data will be written. |
CompressionLevel | level | A tuning knob to trade speed for effectiveness. |
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
ParallelDeflateOutputStream(Stream, CompressionLevel, CompressionStrategy, Boolean)
Create a ParallelDeflateOutputStream using the specified CompressionLevel and CompressionStrategy, and specifying whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.
Declaration
public ParallelDeflateOutputStream(Stream stream, CompressionLevel level, CompressionStrategy strategy, bool leaveOpen)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | stream | The stream to which compressed data will be written. |
CompressionLevel | level | A tuning knob to trade speed for effectiveness. |
CompressionStrategy | strategy | By tweaking this parameter, you may be able to optimize the compression for data with particular characteristics. |
System.Boolean | leaveOpen | true if the application would like the stream to remain open after inflation/deflation. |
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
ParallelDeflateOutputStream(Stream, CompressionLevel, Boolean)
Create a ParallelDeflateOutputStream and specify whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.
Declaration
public ParallelDeflateOutputStream(Stream stream, CompressionLevel level, bool leaveOpen)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | stream | The stream to which compressed data will be written. |
CompressionLevel | level | A tuning knob to trade speed for effectiveness. |
System.Boolean | leaveOpen | true if the application would like the stream to remain open after inflation/deflation. |
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
ParallelDeflateOutputStream(Stream, Boolean)
Create a ParallelDeflateOutputStream and specify whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.
Declaration
public ParallelDeflateOutputStream(Stream stream, bool leaveOpen)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | stream | The stream to which compressed data will be written. |
System.Boolean | leaveOpen | true if the application would like the stream to remain open after inflation/deflation. |
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
Properties
BufferSize
The size of the buffers used by the compressor threads.
Declaration
public int BufferSize { get; set; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The default buffer size is 128k. The application can set this value at any time, but it is effective only before the first Write().
Larger buffer sizes implies larger memory consumption but allows
more efficient compression. Using smaller buffer sizes consumes less
memory but may result in less effective compression. For example,
using the default buffer size of 128k, the compression delivered is
within 1% of the compression delivered by the single-threaded DeflateStream. On the other hand, using a
BufferSize of 8k can result in a compressed data stream that is 5%
larger than that delivered by the single-threaded
DeflateStream
. Excessively small buffer sizes can also cause
the speed of the ParallelDeflateOutputStream to drop, because of
larger thread scheduling overhead dealing with many many small
buffers.
The total amount of storage space allocated for buffering will be (N*S*2), where N is the number of buffer pairs, and S is the size of each buffer (this property). There are 2 buffers used by the compressor, one for input and one for output. By default, DotNetZip allocates 4 buffer pairs per CPU core, so if your machine has 4 cores, then the number of buffer pairs used will be 16. If you accept the default value of this property, 128k, then the ParallelDeflateOutputStream will use 16 * 2 * 128kb of buffer memory in total, or 4mb, in blocks of 128kb. If you set this property to 64kb, then the number will be 16 * 2 * 64kb of buffer memory, or 2mb.
BytesProcessed
The total number of uncompressed bytes processed by the ParallelDeflateOutputStream.
Declaration
public long BytesProcessed { get; }
Property Value
Type | Description |
---|---|
System.Int64 |
Remarks
This value is meaningful only after a call to Close().
CanRead
Indicates whether the stream supports Read operations.
Declaration
public override bool CanRead { get; }
Property Value
Type | Description |
---|---|
System.Boolean |
Overrides
Remarks
Always returns false.
CanSeek
Indicates whether the stream supports Seek operations.
Declaration
public override bool CanSeek { get; }
Property Value
Type | Description |
---|---|
System.Boolean |
Overrides
Remarks
Always returns false.
CanWrite
Indicates whether the stream supports Write operations.
Declaration
public override bool CanWrite { get; }
Property Value
Type | Description |
---|---|
System.Boolean |
Overrides
Remarks
Returns true if the provided stream is writable.
Crc32
The CRC32 for the data that was written out, prior to compression.
Declaration
public int Crc32 { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
This value is meaningful only after a call to Close().
Length
Reading this property always throws a NotSupportedException.
Declaration
public override long Length { get; }
Property Value
Type | Description |
---|---|
System.Int64 |
Overrides
MaxBufferPairs
The maximum number of buffer pairs to use.
Declaration
public int MaxBufferPairs { get; set; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
This property sets an upper limit on the number of memory buffer pairs to create. The implementation of this stream allocates multiple buffers to facilitate parallel compression. As each buffer fills up, this stream uses System.Threading.ThreadPool.QueueUserWorkItem(System.Threading.WaitCallback) to compress those buffers in a background threadpool thread. After a buffer is compressed, it is re-ordered and written to the output stream.
A higher number of buffer pairs enables a higher degree of parallelism, which tends to increase the speed of compression on multi-cpu computers. On the other hand, a higher number of buffer pairs also implies a larger memory consumption, more active worker threads, and a higher cpu utilization for any compression. This property enables the application to limit its memory consumption and CPU utilization behavior depending on requirements.
For each compression "task" that occurs in parallel, there are 2 buffers allocated: one for input and one for output. This property sets a limit for the number of pairs. The total amount of storage space allocated for buffering will then be (N*S*2), where N is the number of buffer pairs, S is the size of each buffer (BufferSize). By default, DotNetZip allocates 4 buffer pairs per CPU core, so if your machine has 4 cores, and you retain the default buffer size of 128k, then the ParallelDeflateOutputStream will use 4 * 4 * 2 * 128kb of buffer memory in total, or 4mb, in blocks of 128kb. If you then set this property to 8, then the number will be 8 * 2 * 128kb of buffer memory, or 2mb.
CPU utilization will also go up with additional buffers, because a larger number of buffer pairs allows a larger number of background threads to compress in parallel. If you find that parallel compression is consuming too much memory or CPU, you can adjust this value downward.
The default value is 16. Different values may deliver better or worse results, depending on your priorities and the dynamic performance characteristics of your storage and compute resources.
This property is not the number of buffer pairs to use; it is an upper limit. An illustration: Suppose you have an application that uses the default value of this property (which is 16), and it runs on a machine with 2 CPU cores. In that case, DotNetZip will allocate 4 buffer pairs per CPU core, for a total of 8 pairs. The upper limit specified by this property has no effect.
The application can set this value at any time, but it is effective only before the first call to Write(), which is when the buffers are allocated.
Position
Returns the current position of the output stream.
Declaration
public override long Position { get; set; }
Property Value
Type | Description |
---|---|
System.Int64 |
Overrides
Remarks
Because the output gets written by a background thread, the value may change asynchronously. Setting this property always throws a NotSupportedException.
Strategy
The ZLIB strategy to be used during compression.
Declaration
public CompressionStrategy Strategy { get; }
Property Value
Type | Description |
---|---|
CompressionStrategy |
Methods
Close()
Close the stream.
Declaration
public override void Close()
Overrides
Remarks
You must call Close on the stream to guarantee that all of the data written in has been compressed, and the compressed data has been written out.
Dispose()
Dispose the object
Declaration
public void Dispose()
Remarks
Because ParallelDeflateOutputStream is IDisposable, the application must call this method when finished using the instance.
This method is generally called implicitly upon exit from
a using
scope in C# (Using
in VB).
Dispose(Boolean)
The Dispose method
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | disposing | indicates whether the Dispose method was invoked by user code. |
Overrides
Flush()
Flush the stream.
Declaration
public override void Flush()
Overrides
Read(Byte[], Int32, Int32)
This method always throws a NotSupportedException.
Declaration
public override int Read(byte[] buffer, int offset, int count)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | buffer | The buffer into which data would be read, IF THIS METHOD ACTUALLY DID ANYTHING. |
System.Int32 | offset | The offset within that data array at which to insert the data that is read, IF THIS METHOD ACTUALLY DID ANYTHING. |
System.Int32 | count | The number of bytes to write, IF THIS METHOD ACTUALLY DID ANYTHING. |
Returns
Type | Description |
---|---|
System.Int32 | nothing. |
Overrides
Reset(Stream)
Resets the stream for use with another stream.
Declaration
public void Reset(Stream stream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | stream | The new output stream for this era. |
Remarks
Because the ParallelDeflateOutputStream is expensive to create, it has been designed so that it can be recycled and re-used. You have to call Close() on the stream first, then you can call Reset() on it, to use it again on another stream.
Examples
ParallelDeflateOutputStream deflater = null;
foreach (var inputFile in listOfFiles)
{
string outputFile = inputFile + ".compressed";
using (System.IO.Stream input = System.IO.File.OpenRead(inputFile))
{
using (var outStream = System.IO.File.Create(outputFile))
{
if (deflater == null)
deflater = new ParallelDeflateOutputStream(outStream,
CompressionLevel.Best,
CompressionStrategy.Default,
true);
deflater.Reset(outStream);
while ((n= input.Read(buffer, 0, buffer.Length)) != 0)
{
deflater.Write(buffer, 0, n);
}
}
}
}
Seek(Int64, SeekOrigin)
This method always throws a NotSupportedException.
Declaration
public override long Seek(long offset, SeekOrigin origin)
Parameters
Type | Name | Description |
---|---|---|
System.Int64 | offset | The offset to seek to.... IF THIS METHOD ACTUALLY DID ANYTHING. |
System.IO.SeekOrigin | origin | The reference specifying how to apply the offset.... IF THIS METHOD ACTUALLY DID ANYTHING. |
Returns
Type | Description |
---|---|
System.Int64 | nothing. It always throws. |
Overrides
SetLength(Int64)
This method always throws a NotSupportedException.
Declaration
public override void SetLength(long value)
Parameters
Type | Name | Description |
---|---|---|
System.Int64 | value | The new value for the stream length.... IF THIS METHOD ACTUALLY DID ANYTHING. |
Overrides
Write(Byte[], Int32, Int32)
Write data to the stream.
Declaration
public override void Write(byte[] buffer, int offset, int count)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | buffer | The buffer holding data to write to the stream. |
System.Int32 | offset | the offset within that data array to find the first byte to write. |
System.Int32 | count | the number of bytes to write. |
Overrides
Remarks
To use the ParallelDeflateOutputStream to compress data, create a ParallelDeflateOutputStream with CompressionMode.Compress, passing a writable output stream. Then call Write() on that ParallelDeflateOutputStream, providing uncompressed data as input. The data sent to the output stream will be the compressed form of the data written.
To decompress data, use the DeflateStream class.