Skip to contents

Sub-sample OTU observations such that all samples have an equal number. If called on data with non-integer abundances, values will be re-scaled to integers between 1 and depth such that they sum to depth.

Usage

rarefy(
  counts,
  depth = 0.1,
  n_samples = NULL,
  seed = 0,
  times = NULL,
  cpus = n_cpus()
)

Arguments

counts

A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

depth

How many observations to keep per sample. When 0 < depth < 1, it is taken as the minimum percentage of the dataset's observations to keep. Ignored when n_samples is specified. Default: 0.1

n_samples

The number of samples to keep. When 0 < n_samples < 1, it is taken as the percentage of samples to keep. If negative, that number of samples is dropped. If 0, all samples are kept. If NULL, then depth is used instead. Default: NULL

seed

An integer seed for randomizing which observations to keep or drop. If you need to create different random rarefactions of the same data, set the seed to a different number each time.

times

How many independent rarefactions to perform. If set, rarefy() will return a list of matrices. The seeds for each matrix will be sequential, starting from seed.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

An integer matrix.

Examples

    # Create an OTU matrix with 4 samples (A-D) and 5 OTUs.
    counts <- matrix(
      data     = c(4,0,3,2,6,0,8,0,0,5,0,9,0,0,7,0,10,0,0,1),
      nrow     = 5,
      dimnames = list(paste0('OTU', 1:5), LETTERS[1:4]) )
    counts
#>      A B C  D
#> OTU1 4 0 0  0
#> OTU2 0 8 9 10
#> OTU3 3 0 0  0
#> OTU4 2 0 0  0
#> OTU5 6 5 7  1
    colSums(counts)
#>  A  B  C  D 
#> 15 13 16 11 
    
    counts <- rarefy(counts, depth = 14)
    counts
#>      A B C D
#> OTU1 0 0 0 0
#> OTU2 0 5 5 4
#> OTU3 0 0 0 0
#> OTU4 0 0 0 0
#> OTU5 4 2 7 1
    colSums(counts)
#>  A  B  C  D 
#>  4  7 12  5