Constructs a comprehensive filter pipeline configuration to be passed
as the compress argument to h5_write(). This function allows fine-grained
control over chunking, pre-filters, compression algorithms, and data scaling.
Usage
h5_compression(
compress = "gzip",
chunk_size = 1024 * 1024,
checksum = FALSE,
int_packing = FALSE,
float_rounding = NULL,
blosc2_delta = FALSE,
blosc2_truncate = NULL
)Arguments
- compress
A string specifying the compression algorithm and optional level (e.g.,
"none","gzip","zstd-7","lz4","blosc1-lz4-9","blosc2-gzip-3","blosc2-zstd"). See the Valid Compression Strings section below for an exhaustive list of supported formats. Default is"gzip".- chunk_size
An integer specifying the target chunk size in bytes. Default is
1048576(1 MB).- checksum
A logical value indicating whether to apply the Fletcher32 checksum filter at the end of the pipeline to detect data corruption. Default is
FALSE.- int_packing
Control the HDF5 Scale-Offset filter for integer datasets. (Note: Incompatible with
szip,zfp,bshuf, and Blosc2 pre-filters).FALSE(Default): Disabled.TRUE: Automatically calculates and applies the mathematically optimal minimum bit-width for each individual chunk.Integer (e.g.,
8): Forces packing into exactly that many bits.
- float_rounding
Control the HDF5 Scale-Offset filter for floating-point datasets.(Note: Incompatible with
szip,zfp,bshuf, and Blosc2 pre-filters).NULL(Default): Disabled.Integer (e.g.,
3): The number of base-10 decimal places of detail to preserve before truncating and packing the values (e.g.,3.141). Negative numbers round to powers of 10.
- blosc2_delta
A logical value. If
TRUEand ablosc2compressor is selected, applies the Blosc2 Delta pre-filter before compression. Default isFALSE.- blosc2_truncate
An integer. If provided and a
blosc2compressor is selected, applies the Blosc2 Truncate Precision pre-filter to floating-point data, preserving exactly the specified number of uncompressed bits. Default isNULL.
Valid Compression Strings
The compress argument accepts a highly specific string syntax to define both
the codec and its operational level.
Native / Core Codecs
"none": No compression."gzip-[level]": Levels1to9. Default is5. (e.g.,"gzip"or"gzip-9")."zstd-[level]": Levels1to22. Default is3. (e.g.,"zstd"or"zstd-7")."lz4-[level]": Levels0to12. Default is0. Level0is standard LZ4. Levels1+trigger LZ4-HC.
Bitshuffle Pre-filter
Forces the native Bitshuffle pre-filter before compression.
"bshuf-lz4": Bitshuffle + LZ4."bshuf-zstd-[level]": Bitshuffle + Zstd (Levels1to22).
Blosc Meta-compressors
Blosc applies its own highly optimized bitshuffling and multi-threading.
Blosc2 (Recommended):
"blosc2"(blosclz),"blosc2-lz4-[level]","blosc2-zstd-[level]","blosc2-gzip-[level]","blosc2-ndlz"Blosc1 (Legacy):
"blosc1"(blosclz),"blosc1-lz4-[level]","blosc1-zstd-[level]","blosc1-gzip-[level]","blosc1-snappy"
ZFP (Lossy Floating-Point Compression)
ZFP can be run standalone (for integers and floats) or inside Blosc2 (floats only). Unlike [level], [tolerance] and [bits] are required.
Accuracy Mode (Absolute error tolerance):
"zfp-acc-[tolerance]"or"blosc2-zfp-acc-[tolerance]"(e.g.,"zfp-acc-0.001").Precision Mode (Bits of precision):
"zfp-prec-[bits]"or"blosc2-zfp-prec-[bits]"(e.g.,"zfp-prec-16").Rate Mode (Bits of storage per value):
"zfp-rate-[bits]"or"blosc2-zfp-rate-[bits]"(e.g.,"zfp-rate-8").Reversible Mode (Standalone Lossless):
"zfp-rev".
Automatic Shuffling
To maximize compression ratios without requiring users to manually manage
complex pipeline interactions, h5_compression automatically configures the
optimal shuffling pre-filter based on the following strict hierarchy:
1. Blosc's Internal Bitshuffle (Preferred)
If a Blosc meta-compressor is selected (e.g., "blosc2-zstd"), the pipeline
automatically enables Blosc's highly optimized, internal bitshuffle routine.
This achieves peak compression performance without requiring the standalone
Bitshuffle plugin to be installed.
2. Explicit Bitshuffle Plugin
If a standard codec is explicitly prefixed with bshuf- (e.g., "bshuf-lz4"),
the pipeline delegates to the standalone Bitshuffle plugin.
3. Native HDF5 Byte Shuffle (Fallback)
If a standard compressor is selected (e.g., "zstd-5" or "gzip"),
the pipeline safely falls back to the core HDF5 library's native byte shuffle
filter. This guarantees improved compression while maintaining universal
compatibility.
4. Strict Mutual Exclusions (When Shuffling is Disabled) To prevent data corruption or wasted CPU cycles, all shuffling is forcefully disabled in the following scenarios:
Scale-Offset Active: If
int_packingorfloat_roundingis applied, shuffling is disabled because scale-offset destroys the byte-alignment that shuffling relies on.ZFP & SZIP: These algorithms perform mathematical compression directly on numerical values and will corrupt if the bitstream is rearranged beforehand.
1-Byte Data: Characters, booleans, and 8-bit integers cannot be meaningfully shuffled, so the step is skipped.
Examples
# 1. Simple fast compression (Zstd level 7)
h5_compression("zstd-7")
#> <HDF5 Compression Configuration>
#> Codec: zstd-7
#> Shuffle: Byte Shuffle (Native HDF5)
#> Chunk Size: 1.00 MB
#> Checksum: None
# 2. Optimal integer packing (Scale-Offset)
h5_compression("gzip-9", int_packing = TRUE)
#> <HDF5 Compression Configuration>
#> Codec: gzip-9
#> Shuffle: None (Disabled by Scale-Offset)
#> Chunk Size: 1.00 MB
#> Checksum: None
#> Int Packing: Optimal (Auto)
# 3. Complex Blosc2 Pipeline (Delta + Zstd)
h5_compression("blosc2-zstd-5", blosc2_delta = TRUE)
#> <HDF5 Compression Configuration>
#> Codec: blosc2-zstd-5
#> Shuffle: Bitshuffle (Blosc Internal)
#> Chunk Size: 1.00 MB
#> Checksum: None
#> Blosc2 Delta: TRUE
# 4. Lossy ZFP compression (Tolerance of 0.05)
h5_compression("zfp-acc-0.05")
#> <HDF5 Compression Configuration>
#> Codec: zfp-acc-0.05
#> Shuffle: None (Incompatible with zfp)
#> Chunk Size: 1.00 MB
#> Checksum: None
# Pass the compress object directly to h5_write
file <- tempfile(fileext = ".h5")
cmp <- h5_compression("gzip-9", checksum = TRUE)
h5_write(combn(1:10, 3), file, "sets", compress = cmp)
print(cmp)
#> <HDF5 Compression Configuration>
#> Codec: gzip-9
#> Shuffle: Byte Shuffle (Native HDF5)
#> Chunk Size: 1.00 MB
#> Checksum: Fletcher32
h5_inspect(file, "sets")
#> <HDF5 Dataset Properties>
#> Type: uint8 Size: 360.00 B
#> Layout: chunked Disk: 79.00 B
#> Chunks: [3 x 120] Ratio: 4.56x
#> Pipeline: gzip -> fletcher32
# Clean up
unlink(file)
