
For years, the R ecosystem has been missing a critical piece of infrastructure: a high-performance, hierarchical data storage API that is actually suitable for use within other CRAN packages. While HDF5 is the global standard for sharing massive, complex datasets across languages like Python, C++, and Julia, R developers have historically been forced to choose between heavy Bioconductor dependencies or unreliable system library configurations.
Today, we are excited to announce the release of h5lite and hdf5lib on CRAN. These packages provide a modern, zero-dependency interface to the HDF5 2.0 storage format, finally making high-performance data storage accessible for both interactive data science and robust R package development.
What is HDF5? The “Supercharged R List” Analogy
If you aren’t familiar with HDF5, the easiest way to think about it is as a persistent, cross-platform R list.
In R, a list is powerful because it is heterogeneous: you can store a matrix of numbers, a data frame of metadata, and a nested list of configuration settings all in one object. HDF5 brings this same flexibility to your hard drive. It allows you to:
Store Heterogeneous Data Together: Keep your raw arrays, tidy data frames, and unstructured metadata (like character strings or enums) inside a single
.h5file.Organize with Groups: Just as you nest lists in R, HDF5 uses a filesystem-like hierarchy of “groups” to organize your data.
Share Across Languages: Because HDF5 is a universal standard, that “list” you saved in R can be opened directly in Python as a dictionary or in Julia as a named tuple, with all your metadata and dimensions preserved.
The Motivation: Unlocking HDF5 for CRAN Packages
The primary goal of h5lite and hdf5lib is to end the “Installation Lottery” for R developers. Existing HDF5 interfaces in R present significant hurdles for anyone trying to build a stable, redistributable package:
The Dependency Barrier: Relying on the
rhdf5ecosystem requires your users to navigateBiocManager, which complicates the standardinstall.packages()workflow and can cause issues on CRAN check farms.The System Library Gamble: The
hdf5rpackage typically requires users to manually installlibhdf5-devon their OS. This is notoriously difficult for Windows users and creates an unpredictable environment where your package might fail if the user has the “wrong” library version.
We built hdf5lib to solve this by bundling the complete HDF5 2.0.0 source code directly. It compiles natively during installation on macOS, Windows, and Linux with zero external dependencies. For the first time, R package developers can simply add LinkingTo: hdf5lib to their DESCRIPTION file and guarantee that their users have a high-performance, thread-safe HDF5 API ready to go.

The Interface: h5lite
h5lite is an opinionated, streamlined interface that handles the complexities of type mapping and interoperability so you can focus on your data. It manages the following default behaviors:
| R Data Type | HDF5 Equivalent | Description |
|---|---|---|
| Numeric | variable | Selects optimal type: uint8, float32, etc. |
| Logical | uint8 |
Stored as 0 (FALSE) or 1 (TRUE). |
| Character | string |
Variable or fixed-length; UTF-8 or ASCII. |
| Complex | complex |
Native HDF5 2.0+ complex numbers. |
| Raw | opaque |
Raw bytes / binary data. |
| Factor | enum |
Integer indices with label mapping. |
| integer64 | int64 |
64-bit signed integers via bit64 package. |
| POSIXt | string |
ISO 8601 string (YYYY-MM-DDTHH:MM:SSZ). |
| List | group |
Recursive container structure. |
| Data Frame | compound |
Table of mixed types. |
| NULL | null |
Creates a placeholder. |
Key Features
Smart Defaults: The package automatically chooses efficient storage types, such as
int8for small integers orfloat64for vectors containingNAvalues.Precision Control: Use the
asargument to explicitly request types likebfloat16,float32, or specific fixed-length strings to match external schemas.Transparent Interoperability: R uses column-major order while HDF5 uses row-major.
h5litehandles these permutations automatically, ensuring your data opens correctly in Python or Julia.Metadata Preservation: R
names,row.names, anddimnamesare preserved as HDF5 dimension scales, maintaining your metadata across platforms.Compression: Built-in zlib compression is supported. When enabled,
h5liteapplies a “shuffle” filter to rearrange bytes for significantly better compression of numerical data.

The Engine: hdf5lib
Under the hood, h5lite is powered by hdf5lib. While most users won’t interact with it directly, it serves as a rock-solid foundation for R package developers.
Bundled Source:
hdf5libbundles the complete HDF5 2.0.0 source code. No external libraries are needed; it compiles natively during installation on MacOS, Windows, and Linux.Thread-Safety: Unlike many system builds,
hdf5libis compiled with thread-safety enabled, allowing for safe parallel I/O viaRcppParallelorOpenMP.Versioning Guarantee: Developers can now specify
LinkingTo: hdf5lib (>= 2.0)in theirDESCRIPTIONfile. This guarantees that HDF5 2.0 features are available on the user’s system, finally ending the “system lottery.” API versioning also safeguards against breaking changes in future HDF5 releases.
Stability and Minimal Footprint
Reliability is paramount for a storage interface. Both packages minimize their dependency footprint by not importing any other R packages.
To ensure a stable experience, h5lite maintains a 100% code coverage test suite. Both packages have been rigorously validated across standard CRAN platforms (Linux, Windows, macOS) and compilers (GCC, Clang). For developers, we have prioritized memory safety by passing checks for Valgrind, ASAN, and UBSAN, ensuring that packages linking to these libraries remain free of upstream memory issues.
Comparison at a Glance
| Feature | h5lite / hdf5lib | rhdf5 / Rhdf5lib | hdf5r |
|---|---|---|---|
| Repository | CRAN | Bioconductor | CRAN |
| HDF5 Version | 2.0.0 (Guaranteed) | 1.10.7 (or System) | System |
| API Philosophy | Streamlined | Comprehensive | Comprehensive |
| Install Friction | None | BiocManager | System Libs |
| Thread Safety | Yes (Default) | No (Default) | Varies |
Try It Out
install.packages("h5lite")
library(h5lite)
file <- tempfile(fileext = ".h5")
# Write various R objects
h5_write(1:10, file, "ints") # Integer Vector
h5_write(I("example"), file, "/", attr = "id") # Scalar Attribute
h5_write(matrix(rnorm(9), 3, 3), file, "mtx") # Numeric Matrix
h5_write(factor(letters), file, "letters") # Factor -> ENUM
h5_write(list(a = 1, b = 2), file, "config") # List -> GROUP
h5_write(iris, file, "flowers/iris") # Data Frame -> COMPOUND
# Write with specific type coercions
h5_write(iris, file, "flowers/coerced",
as = c(Sepal.Length = "bfloat16", .numeric = "float32"))
# Inspect the file structure
h5_str(file)
#> /
#> ├── @id <utf8[7] scalar>
#> ├── mtx <float64 × 3 × 3>
#> ├── letters <enum × 26>
#> ├── config/
#> │ ├── a <uint8 × 1>
#> │ └── b <uint8 × 1>
#> ├── ints <uint8 × 10>
#> └── flowers/
#> ├── iris <compound[5] × 150>
#> │ ├── $Sepal.Length <float64>
#> │ ├── $Sepal.Width <float64>
#> │ ├── $Petal.Length <float64>
#> │ ├── $Petal.Width <float64>
#> │ └── $Species <enum>
#> └── coerced <compound[5] × 150>
#> ├── $Sepal.Length <bfloat16>
#> ├── $Sepal.Width <float32>
#> ├── $Petal.Length <float32>
#> ├── $Petal.Width <float32>
#> └── $Species <enum>For a deeper dive, explore the Getting Started with h5lite guide and the hdf5lib documentation.