h5lite is the pain-free way to work with HDF5 files in R.
It is designed for data scientists who want to read/write objects and move on, and for package developers who need a reliable, dependency-free storage backend.
Why h5lite?
If you’ve struggled with complex HDF5 bindings in the past, h5lite offers a fresh approach:
-
It Just Works: No need to understand HDF5 dataspaces, hyperslabs, or property lists.
h5litemaps R objects (numeric, character, factor, data.frame, and more) directly to their HDF5 equivalents. -
Zero System Dependencies:
h5litebundles the HDF5 library (viahdf5lib). Users do not need to install HDF5 system libraries manually. -
Smart Defaults, Full Control: It automatically selects the most efficient data types (e.g., saving space by storing small integers as
int8), but gives you granular control when you need to conform to a strict spec.
Installation
Install the released version from CRAN:
install.packages("h5lite")Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("cmmr/h5lite")Quick Start
The API consists primarily of two functions: h5_write() and h5_read().
library(h5lite)
file <- tempfile(fileext = ".h5")
# 1. Write simple objects
h5_write(1:10, file, "my_vector")
h5_write(I(42), file, "my_vector", attr = "my_id")
h5_write(matrix(rnorm(9), 3, 3), file, "my_matrix")
# 2. Write a list (creates a group hierarchy)
config <- list(version = 1.0, params = list(a = 1, b = 2))
h5_write(config, file, "simulation_config")
# 3. Read it back
my_vec <- h5_read(file, "my_vector")
# 4. Inspect the file
h5_ls(file)
#> [1] "my_vector" "my_matrix" "simulation_config"
#> [4] "simulation_config/version" "simulation_config/params" "simulation_config/params/a"
#> [7] "simulation_config/params/b"
h5_str(file)
#> /
#> ├── my_vector <uint8 × 10>
#> │ └── @my_id <uint8 scalar>
#> ├── my_matrix <float64 × 3 × 3>
#> └── simulation_config/
#> ├── version <uint8 × 1>
#> └── params/
#> ├── a <uint8 × 1>
#> └── b <uint8 × 1>Smart Data Typing
h5lite inspects your data and chooses the safest, most compact HDF5 data type automatically. You don’t need to know the specific HDF5 type codes; h5lite handles the translation.
Power User Features
The as Argument: Precise Control
Need to conform to a specific file specification? The as argument allows you to override automatic behavior and explicitly define on-disk types.
# Force specific numeric types
h5_write(1:10, file, "dataset_a", as = "int32")
h5_write(rnorm(10), file, "dataset_b", as = "float32")
# Control string lengths (e.g., fixed-length ASCII for compatibility)
h5_write(c("A", "B"), file, "fixed_strs", as = "ascii[10]")
h5_str(file)
#> ...
#> ├── dataset_a <int32 × 10>
#> ├── dataset_b <float32 × 10>
#> └── fixed_strs <ascii[10] × 2>When writing Data Frames, you can map types for specific columns using a named vector.
df <- data.frame(
id = 1:5,
score = c(1.1, 2.2, 3.3, 4.4, 5.5),
note = c("a", "b", "c", "d", "e")
)
# Store 'id' as 16-bit integer, 'score' as 32-bit float, and coerce 'note' to ascii
h5_write(df, file, "experiment_data",
as = c(id = "uint16", score = "float32", note = "ascii"))
h5_str(file)
#> ...
#> └── experiment_data <compound[3] × 5>
#> ├── $id <uint16>
#> ├── $score <float32>
#> └── $note <ascii>Data Compression (Szip & Gzip)
h5lite supports transparent data compression using the compress argument. While gzip is the universal standard, szip is available for high-performance scientific data.
Efficient Partial Reading
For large datasets that exceed system RAM, h5lite provides partial reading via start and count parameters. It automatically targets the most logical dimension (e.g., rows in a matrix or elements in a vector).
# Read a 100-row slice starting from row 500
subset <- h5_read(file, "large_matrix", start = 500, count = 100)Comparison
| Feature | h5lite | rhdf5 / hdf5r |
|---|---|---|
| Philosophy | “Opinionated” & Simple | Comprehensive Wrapper |
| API Style | Native R (read/write) |
Low-level (Files, Dataspaces, Memspaces) |
| HDF5 Installation | Bundled (Zero-config) | System Requirement (Manual install often required) |
| Data Typing | Automatic (safe defaults) | Manual (user specified) |
| Partial I/O | Supported (Simplified) | Supported (Manual hyperslabs) |
| Learning Curve | Low (Minutes) | High (Days) |
Use rhdf5 or hdf5r if you need to:
- Work with complex or custom HDF5 data types not supported by
h5lite(e.g., bitfields, references). - Have fine-grained control over file properties, chunking shapes, or custom compression filters.
Use h5lite if you want to:
- Quickly and safely get data into or out of a file.
- Perform efficient partial reads without the complexity of low-level hyperslab math.
- Avoid thinking about low-level details.
Documentation
- Get Started: A general introduction.
- Atomic Vectors: Details on vectors and scalars.
- Data Types: Controlling storage types.
- Data Compression: Szip/Gzip compression details.
-
Partial Reading: Efficiently reading data subsets with
startandcount. - Matrices and Arrays: Handling multi-dimensional data.
- Data Frames: Using compound datasets.
- Data Organization: Using groups and lists to structure files.
- Attributes In-Depth: A deep dive into metadata handling.
-
Object-Oriented Interface: A guide to the
h5_open()handle for a streamlined workflow. - Parallel Processing: Guide for multi-threaded and multi-process access.
