library(h5lite)
# We'll use a temporary file for this guide.
file <- tempfile(fileext = ".h5")
# Create a dataset to work with
h5_write(file, "my_data", 1:10)Introduction
In HDF5, attributes are small, named pieces of metadata that can be attached to datasets or groups. They are distinct from datasets, which are designed to hold primary data.
This vignette provides a deep dive into working with attributes in
h5lite, covering: * Basic attribute I/O. * The powerful
attrs argument for round-tripping R object attributes. *
Important limitations and special cases.
For an introduction to writing datasets, see
vignette("h5lite").
Basic Attribute I/O
The core functions for direct attribute manipulation are
h5_write_attr(), h5_read_attr(), and
h5_ls_attr().
Let’s add some metadata to our my_data dataset.
# Write a scalar string attribute
h5_write_attr(file, "my_data", "units", I("meters/sec"))
# Write a numeric vector attribute
h5_write_attr(file, "my_data", "quality_flags", c(1, 1, 0, 1))
# List the attributes on the object
h5_ls_attr(file, "my_data")
#> [1] "units" "quality_flags"
# Read one of the attributes back
units <- h5_read_attr(file, "my_data", "units")
print(units)
#> [1] "meters/sec"Automatic Round-tripping with the attrs Argument
While h5_write_attr() is useful for manual metadata, the
real power comes from the attrs argument in
h5_write() and h5_read(). This allows for
high-fidelity round-trips of R objects by preserving their R-level
attributes.
Consider a named vector with a custom attribute.
named_vec <- c(a = 1, b = 2, c = 3)
attr(named_vec, "info") <- "My special vector"
# Write the vector, telling h5lite to save its attributes
h5_write(file, "named_vec", named_vec, attrs = TRUE)
# Inspect the HDF5 attributes that were created
h5_ls_attr(file, "named_vec")
#> [1] "names" "info"When we read it back with attrs = TRUE,
h5lite re-attaches the HDF5 attributes as R attributes.
read_vec <- h5_read(file, "named_vec", attrs = TRUE)
# The object is perfectly restored
all.equal(named_vec, read_vec)
#> [1] TRUE
str(read_vec)
#> Named num [1:3] 1 2 3
#> - attr(*, "names")= chr [1:3] "a" "b" "c"
#> - attr(*, "info")= chr "My special vector"Fine-Grained Control with Character Vectors
The attrs argument also accepts a character vector to
specify exactly which attributes to include or exclude.
-
Inclusion list:
attrs = c("names", "info")will only write/read those specific attributes. -
Exclusion list:
attrs = c("-class", "-dim")will write/read all attributes except the ones listed.
# Write the vector, but only include the 'names' attribute
h5_write(file, "selective_vec", named_vec, attrs = c("names"))
# Only the 'names' attribute was written
h5_ls_attr(file, "selective_vec")
#> [1] "names"Limitations and Special Cases
There are a few important rules and limitations to be aware of when working with attributes.
Limitation: List Attributes
More generally, h5lite cannot write any R attribute that
is a list. HDF5 attributes are designed to hold simple,
atomic data, not nested structures.
When using attrs = TRUE or h5_write_attr(),
you can write attributes that are: * Atomic vectors
(numeric, integer, character,
logical, raw) * factors *
data.frames (which become compound attributes) *
NULL
Attempting to write an object that has a list attribute
will result in an error.
my_vec <- 1:3
attr(my_vec, "bad_attr") <- list(a = 1, b = 2)
h5_write(file, "my_vec_list_attr", my_vec, attrs = TRUE)
#> Error in validate_attrs(data, attrs): Attribute 'bad_attr' cannot be written to HDF5 because its type ('list') is not supported. Only atomic vectors and factors can be written as attributes.Limitation: dimnames
R stores the dimnames of a matrix or array as a
list attribute. As explained above, h5lite
cannot write list-like attributes. Attempting to write
a named matrix with attrs = TRUE will fail.
See vignette("matrices") for more details on this
specific case.
named_matrix <- matrix(1:4, 2, dimnames = list(c("r1", "r2"), c("c1", "c2")))
# This fails because the 'dimnames' attribute is a list
h5_write(file, "named_matrix", named_matrix, attrs = TRUE)
#> Error in validate_attrs(data, attrs): Attribute 'dimnames' cannot be written to HDF5 because its type ('list') is not supported. Only atomic vectors and factors can be written as attributes.Workaround: Either remove the dimnames
before writing, or write with attrs = FALSE (the default),
which will save the matrix data but discard the names.
Data Frames:
Unlike matrices, the attributes of a data.frame can be
safely round-tripped with h5lite. This is because its key
metadata is stored in a format that h5lite can handle. See
vignette("data-frames") for more on working with data
frames.
- The column names are stored in the
namesattribute, which is a character vector. - The row names are stored in the
row.namesattribute, which is either a character or integer vector.
Since neither of these are list attributes (like the
dimnames of a matrix), they can be written to HDF5 without
issue. As a result, you can reliably use attrs = TRUE to
preserve the structure of a data.frame during a write/read
cycle.
# Create a data.frame with non-default row names
df <- data.frame(
x = 1:3,
y = c("a", "b", "c"),
row.names = c("row1", "row2", "row3")
)
# Write with attrs = TRUE to preserve names
h5_write(file, "my_df", df, attrs = TRUE)
# Read it back
read_df <- h5_read(file, "my_df", attrs = TRUE)
# The data.frame is perfectly restored
all.equal(df, read_df)
#> [1] TRUE
# Clean up the temporary file
unlink(file)