Writes an R object to an HDF5 file, creating the file if it does not exist.
This function can write atomic vectors, matrices, arrays, factors, data.frames,
and nested lists.
Arguments
- file
Path to the HDF5 file.
- name
Name of the dataset (e.g., "/data/matrix").
- data
The R object to write. Supported:
numeric,integer,complex,logical,character,factor,raw,matrix,data.frame,NULL, and nestedlists.- dtype
The target HDF5 data type. Can be one of
"auto","float16","float32","float64","int8","int16","int32","int64","uint8","uint16","uint32", or"uint64". The default,"auto", selects the most space-efficient type for the data. See details below.- compress
A logical or an integer from 0-9. If
TRUE, compression level 5 is used. IfFALSEor0, no compression is used. An integer1-9specifies the zlib compression level directly.- attrs
Controls which R attributes of
dataare written to the HDF5 object. Can beFALSE(the default),TRUE(all attributes exceptdim), a character vector of attribute names to include (e.g.,c("info", "version")), or a character vector of names to exclude, prefixed with-(e.g.,c("-class")).
Writing Scalars
By default, h5_write saves single-element vectors as 1-dimensional arrays.
To write a true HDF5 scalar, wrap the value in I() to treat it "as-is."
For example, h5_write(file, "x", I(5)) will create a scalar dataset, while
h5_write(file, "x", 5) will create a 1D array of length 1.
Writing Lists
If data is a list (but not a data.frame), h5_write will write it
recursively, creating a corresponding group and dataset structure.
R
listobjects are created as HDF5 groups.All other supported R objects (vectors, matrices, arrays, factors,
data.frames) are written as HDF5 datasets.Attributes of a list are written as HDF5 attributes on the corresponding group.
Before writing, a "dry run" is performed to validate that all objects and attributes within the list are of a writeable type. If any part of the structure is invalid, the function will throw an error and no data will be written.
Writing NULL
If data is NULL, h5_write will create an HDF5 null dataset. This is
a dataset with a null dataspace, which contains no data.
Writing Data Frames
data.frame objects are written as HDF5 compound datasets. This is a
native HDF5 table-like structure that is highly efficient and portable.
Writing Complex Numbers
h5lite writes R complex objects using the native HDF5 H5T_COMPLEX
datatype class, which was introduced in HDF5 version 2.0.0. As a result,
HDF5 files containing complex numbers written by h5lite can only be read
by other HDF5 tools that support HDF5 version 2.0.0 or later.
Writing Date-Time Objects
POSIXt objects are automatically converted to character strings in
ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ). This ensures that timestamps are
stored in a human-readable and unambiguous way. This conversion applies to
standalone POSIXt objects, as well as to columns within a data.frame.
Data Type Selection (dtype)
The dtype argument controls the on-disk storage type and only applies to
integer, numeric, and logical vectors. For all other data types
(character, complex, factor, raw), the storage type is determined
automatically.
If dtype is set to "auto" (the default), h5lite will automatically
select the most space-efficient HDF5 type based on the following rules:
If the data contains fractional values (e.g.,
1.5), it is stored asfloat64.If the data contains
NA,NaN, orInf, it is stored using the smallest floating-point type (float16,float32, orfloat64) that can precisely represent all integer values in the vector.If the data contains only finite integers (this includes
logicalvectors, whereFALSEis 0 andTRUEis 1),h5liteselects the smallest possible integer type (e.g.,uint8,int16).If integer values exceed R's safe integer range (
+/- 2^53), they are automatically stored asfloat64to preserve precision.
To override this automatic behavior, you can specify an exact type. The full list of supported values is:
"auto""float16","float32","float64""int8","int16","int32","int64""uint8","uint16","uint32","uint64"
Attribute Round-tripping
To properly round-trip an R object, it is helpful to set attrs = TRUE. This
preserves important R metadata—such as the names of a named vector, row.names
of a data.frame, or the class of an object—as HDF5 attributes.
Limitation: HDF5 has no direct analog for R's dimnames.
Attempting to write an object that has dimnames (e.g., a named matrix)
with attrs = TRUE will result in an error. You must either remove the
dimnames or set attrs = FALSE.
Examples
file <- tempfile(fileext = ".h5")
# Write a simple vector (dtype is auto-detected as uint8)
h5_write(file, "vec1", 1:20)
h5_typeof(file, "vec1") # "uint8"
#> [1] "uint8"
# Write a matrix, letting h5_write determine dimensions
mat <- matrix(rnorm(12), nrow = 4, ncol = 3)
h5_write(file, "group/mat", mat)
h5_dim(file, "group/mat") # c(4, 3)
#> [1] 4 3
# Overwrite the first vector, forcing a 32-bit integer type
h5_write(file, "vec1", 101:120, dtype = "int32")
h5_typeof(file, "vec1") # "int32"
#> [1] "int32"
# Write a scalar value
h5_write(file, "scalar", I(3.14))
# Write a named vector and preserve its names by setting attrs = TRUE
named_vec <- c(a = 1, b = 2)
h5_write(file, "named_vector", named_vec, attrs = TRUE)
# Write a nested list, which creates groups and datasets
my_list <- list(
config = list(version = 1.2, user = "test"),
data = matrix(1:4, 2)
)
attr(my_list, "info") <- "Session data"
h5_write(file, "session_data", my_list)
h5_ls(file, recursive = TRUE)
#> [1] "vec1" "scalar"
#> [3] "named_vector" "group"
#> [5] "group/mat" "session_data"
#> [7] "session_data/config" "session_data/config/version"
#> [9] "session_data/config/user" "session_data/data"
unlink(file)
