Sub-sample OTU observations such that all samples have an equal number.
If called on data with non-integer abundances, values will be re-scaled to
integers between 1 and depth such that they sum to depth.
Usage
rarefy(
counts,
depth = 0.1,
n_samples = NULL,
seed = 0,
times = NULL,
drop = TRUE,
margin = 1L,
cpus = n_cpus()
)Arguments
- counts
A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with
as.matrix()can be given here, as well asphyloseq,rbiom,SummarizedExperiment, andTreeSummarizedExperimentobjects. For optimal performance with very large datasets, see the guide invignette('performance').- depth
How many observations to keep per sample. When
0 < depth < 1, it is taken as the minimum percentage of the dataset's observations to keep. Ignored whenn_samplesis specified. Default:0.1- n_samples
The number of samples to keep. When
0 < n_samples < 1, it is taken as the percentage of samples to keep. If negative, that number of samples is dropped. If0, all samples are kept. IfNULL, thendepthis used instead. Default:NULL- seed
An integer seed for randomizing which observations to keep or drop. If you need to create different random rarefactions of the same data, set the seed to a different number each time. Default:
0- times
How many independent rarefactions to perform. If set,
rarefy()will return a list of matrices. The seeds for each matrix will be sequential, starting fromseed. Default:NULL- drop
Drop rows and columns with zero observations after rarefying. Default:
TRUE- margin
If your samples are in the matrix's rows, set to
1L. If your samples are in columns, set to2L. Ignored whencountsis aphyloseq,rbiom,SummarizedExperiment, orTreeSummarizedExperimentobject. Default:1L- cpus
How many parallel processing threads should be used. The default,
n_cpus(), will use all logical CPU cores.
Value
A rarefied matrix. Matrix and slam objects will be returned with
the same type; otherwise a base R matrix will be returned.
Examples
# A 4-sample x 5-OTU matrix with samples in rows.
counts <- matrix(c(0,0,0,0,0,8,9,10,5,5,5,5,2,0,0,0,6,5,7,0), 4, 5,
dimnames = list(LETTERS[1:4], paste0('OTU', 1:5)))
counts
#> OTU1 OTU2 OTU3 OTU4 OTU5
#> A 0 0 5 2 6
#> B 0 8 5 0 5
#> C 0 9 5 0 7
#> D 0 10 5 0 0
rowSums(counts)
#> A B C D
#> 13 18 21 15
# Rarefy all samples to a depth of 13.
# Note that sample 'A' has 0 counts and is dropped.
r_mtx <- rarefy(counts, depth = 13, seed = 1)
r_mtx
#> OTU2 OTU3 OTU4 OTU5
#> A 0 5 2 6
#> B 4 4 0 5
#> C 3 5 0 5
#> D 10 3 0 0
rowSums(r_mtx)
#> A B C D
#> 13 13 13 13
# Keep zero-sum rows and columns by setting `drop = FALSE`.
rarefy(counts, depth = 13, drop = FALSE, seed = 1)
#> OTU1 OTU2 OTU3 OTU4 OTU5
#> A 0 0 5 2 6
#> B 0 4 4 0 5
#> C 0 3 5 0 5
#> D 0 10 3 0 0
# Rarefy to the depth of the 2nd most abundant sample (B, depth=22).
rarefy(counts, n_samples = 2, seed = 1)
#> OTU2 OTU3 OTU5
#> B 8 5 5
#> C 6 5 7
# Perform 3 independent rarefactions.
r_list <- rarefy(counts, depth = 13, times = 3, seed = 1)
length(r_list)
#> [1] 3
r_list[[1]]
#> OTU2 OTU3 OTU4 OTU5
#> A 0 5 2 6
#> B 4 4 0 5
#> C 3 5 0 5
#> D 10 3 0 0
# The class of the input matrix is preserved.
if (requireNamespace('Matrix', quietly = TRUE)) {
counts_dgC <- Matrix::Matrix(counts, sparse = TRUE)
class(counts_dgC)
r_dgC <- rarefy(counts_dgC, depth = 13, seed = 1)
class(r_dgC)
}
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
