The LZF filter is a stand-alone compression filter for HDF5, which can be used in place of the built-in DEFLATE (or SZIP) compressors to provide faster compression. The target performance point for LZF is very high-speed compression with an "acceptable" compression ratio.
In benchmark trials with floating-point data (below), a filter pipeline with LZF typically provides 3x-5x faster compression than DEFLATE, 2x faster decompression, and retains 50%-90% of the DEFLATE compression ratio.
Unlike SZIP, this filter works for all datatypes on which DEFLATE works, including compound, opaque, array and user-defined types. There are also no settings to adjust.
The LZF filter is written in C and may be included in C++ applications. No external libraries are required. HDF5 versions 1.6 and 1.8 are both supported. The license is 3-clause BSD.
This filter is maintained as part of the HDF5 for Python (h5py) project. The goal of h5py is to provide access to the majority of the HDF5 C API and feature set from Python. A stand-alone version of the LZF filter is packaged inside the UNIX tarball for h5py, available here.
Based on LibLZF by Marc Lehmann.
Compression performance depends on many factors, including the storage datatype and the range of values used. LZF can be used on arbitrary HDF5 types, including strings, compound and arrays in addition to scalars. However, performance for multi-byte floating point and integer data sets is of particular importance, as they are so commonly used.
Therefore, this simple benchmark compares the performance of several HDF5 compression techniques on single-precision floating-point data sets of various complexity. For LZF, DEFLATE and the PyTables LZO filter, the HDF5 SHUFFLE filter is also applied. The measured quantity for all filters is the performance of the entire pipeline.
Compression ratio is measured as the percent reduction in file size; 0.0% is uncompressed while 100% would be perfect compression.
Also keep in mind that even with a 200-round ensemble, these times are not precise to more than a few milliseconds. Additionally, only one platform (32-bit Intel Linux) was tested.
Compression Type | Compress time (ms) | Decompress time (ms) | Compressed by |
NULL | 10.7 | 6.5 | 0.00% |
LZF | 18.6 | 17.8 | 96.66% |
LZO | 20.2 | 17.9 | 98.55% |
GZIP | 58.1 | 40.5 | 98.53% |
SZIP | 63.1 | 61.3 | 72.68% |
Compression type | Compress time (ms) | Decompress time (ms) | Compressed by |
NULL | 10.1 | 6.5 | 0.00% |
LZF | 54.5 | 22.2 | 38.42% |
LZO | 86.9 | 22.9 | 44.24% |
GZIP | 215.1 | 58.6 | 45.54% |
SZIP | 101.8 | 94.5 | 27.05% |
Compression type | Compress time (ms) | Decompress time (ms) | Compressed by |
NULL | 10.8 | 6.5 | 0.00% |
LZF | 65.5 | 24.4 | 15.54% |
LZO | 125.4 | 26.7 | 17.25% |
GZIP | 298.6 | 64.8 | 20.05% |
SZIP | 115.2 | 102.5 | 16.29% |
Compression type | Compress time (ms) | Decompress time (ms) | Compressed by |
NULL | 9.0 | 7.8 | 0.00% |
LZF | 67.8 | 24.9 | 8.95% |
LZO | 124.0 | 30.6 | 12.78% |
GZIP | 305.4 | 67.2 | 17.05% |
SZIP | 120.6 | 107.7 | 15.56% |
Chunk size | LZF | LZO | GZIP |
32k | 35.74% | 36.57% | 38.25% |
96k | 37.93% | 41.98% | 44.18% |
192k | 38.42% | 44.24% | 45.54% |
384k | 38.61% | 45.38% | 46.35% |
Chunk size | LZF | LZO | GZIP | |||
32k | 63.8 | 20.1 | 96.7 | 18.4 | 172.0 | 43.0 |
96k | 57.0 | 20.4 | 88.4 | 17.4 | 202.2 | 50.6 |
192k | 55.7 | 22.6 | 90.2 | 21.6 | 214.1 | 58.6 |
384k | 57.5 | 27.2 | 93.8 | 27.2 | 221.5 | 65.3 |