Skip to contents

Writes an R object to an HDF5 file, creating the file if it does not exist. This function can write atomic vectors, matrices, arrays, factors, data.frames, and nested lists.

Usage

h5_write(file, name, data, dtype = "auto", compress = TRUE, attrs = FALSE)

Arguments

file

Path to the HDF5 file.

name

Name of the dataset (e.g., "/data/matrix").

data

The R object to write. Supported: numeric, integer, complex, logical, character, factor, raw, matrix, data.frame, NULL, and nested lists.

dtype

The target HDF5 data type. Can be one of "auto", "float16", "float32", "float64", "int8", "int16", "int32", "int64", "uint8", "uint16", "uint32", or "uint64". The default, "auto", selects the most space-efficient type for the data. See details below.

compress

A logical or an integer from 0-9. If TRUE, compression level 5 is used. If FALSE or 0, no compression is used. An integer 1-9 specifies the zlib compression level directly.

attrs

Controls which R attributes of data are written to the HDF5 object. Can be FALSE (the default), TRUE (all attributes except dim), a character vector of attribute names to include (e.g., c("info", "version")), or a character vector of names to exclude, prefixed with - (e.g., c("-class")).

Value

Invisibly returns file. This function is called for its side effects.

Writing Scalars

By default, h5_write saves single-element vectors as 1-dimensional arrays. To write a true HDF5 scalar, wrap the value in I() to treat it "as-is." For example, h5_write(file, "x", I(5)) will create a scalar dataset, while h5_write(file, "x", 5) will create a 1D array of length 1.

Writing Lists

If data is a list (but not a data.frame), h5_write will write it recursively, creating a corresponding group and dataset structure.

  • R list objects are created as HDF5 groups.

  • All other supported R objects (vectors, matrices, arrays, factors, data.frames) are written as HDF5 datasets.

  • Attributes of a list are written as HDF5 attributes on the corresponding group.

  • Before writing, a "dry run" is performed to validate that all objects and attributes within the list are of a writeable type. If any part of the structure is invalid, the function will throw an error and no data will be written.

Writing NULL

If data is NULL, h5_write will create an HDF5 null dataset. This is a dataset with a null dataspace, which contains no data.

Writing Data Frames

data.frame objects are written as HDF5 compound datasets. This is a native HDF5 table-like structure that is highly efficient and portable.

Writing Complex Numbers

h5lite writes R complex objects using the native HDF5 H5T_COMPLEX datatype class, which was introduced in HDF5 version 2.0.0. As a result, HDF5 files containing complex numbers written by h5lite can only be read by other HDF5 tools that support HDF5 version 2.0.0 or later.

Writing Date-Time Objects

POSIXt objects are automatically converted to character strings in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ). This ensures that timestamps are stored in a human-readable and unambiguous way. This conversion applies to standalone POSIXt objects, as well as to columns within a data.frame.

Data Type Selection (dtype)

The dtype argument controls the on-disk storage type and only applies to integer, numeric, and logical vectors. For all other data types (character, complex, factor, raw), the storage type is determined automatically.

If dtype is set to "auto" (the default), h5lite will automatically select the most space-efficient HDF5 type based on the following rules:

  1. If the data contains fractional values (e.g., 1.5), it is stored as float64.

  2. If the data contains NA, NaN, or Inf, it is stored using the smallest floating-point type (float16, float32, or float64) that can precisely represent all integer values in the vector.

  3. If the data contains only finite integers (this includes logical vectors, where FALSE is 0 and TRUE is 1), h5lite selects the smallest possible integer type (e.g., uint8, int16).

  4. If integer values exceed R's safe integer range (+/- 2^53), they are automatically stored as float64 to preserve precision.

To override this automatic behavior, you can specify an exact type. The full list of supported values is:

  • "auto"

  • "float16", "float32", "float64"

  • "int8", "int16", "int32", "int64"

  • "uint8", "uint16", "uint32", "uint64"

Attribute Round-tripping

To properly round-trip an R object, it is helpful to set attrs = TRUE. This preserves important R metadata—such as the names of a named vector, row.names of a data.frame, or the class of an object—as HDF5 attributes.

Limitation: HDF5 has no direct analog for R's dimnames. Attempting to write an object that has dimnames (e.g., a named matrix) with attrs = TRUE will result in an error. You must either remove the dimnames or set attrs = FALSE.

Examples

file <- tempfile(fileext = ".h5")

# Write a simple vector (dtype is auto-detected as uint8)
h5_write(file, "vec1", 1:20)
h5_typeof(file, "vec1") # "uint8"
#> [1] "uint8"

# Write a matrix, letting h5_write determine dimensions
mat <- matrix(rnorm(12), nrow = 4, ncol = 3)
h5_write(file, "group/mat", mat)
h5_dim(file, "group/mat") # c(4, 3)
#> [1] 4 3

# Overwrite the first vector, forcing a 32-bit integer type
h5_write(file, "vec1", 101:120, dtype = "int32")
h5_typeof(file, "vec1") # "int32"
#> [1] "int32"

# Write a scalar value
h5_write(file, "scalar", I(3.14))

# Write a named vector and preserve its names by setting attrs = TRUE
named_vec <- c(a = 1, b = 2)
h5_write(file, "named_vector", named_vec, attrs = TRUE)

# Write a nested list, which creates groups and datasets
my_list <- list(
  config = list(version = 1.2, user = "test"),
  data = matrix(1:4, 2)
)
attr(my_list, "info") <- "Session data"
h5_write(file, "session_data", my_list)

h5_ls(file, recursive = TRUE)
#>  [1] "vec1"                        "scalar"                     
#>  [3] "named_vector"                "group"                      
#>  [5] "group/mat"                   "session_data"               
#>  [7] "session_data/config"         "session_data/config/version"
#>  [9] "session_data/config/user"    "session_data/data"          

unlink(file)