Skip to contents

A non-parametric estimator of species richness that separates features into abundant and rare groups.

Usage

ace(counts, cutoff = 10L, margin = 1L, cpus = n_cpus())

Arguments

counts

A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.

cutoff

The maximum number of observations to consider "rare". Default: 10.

margin

The margin containing samples. 1 if samples are rows, 2 if samples are columns. Ignored when counts is a special object class (e.g. phyloseq). Default: 1

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Details

The ACE metric separates features into "abundant" and "rare" groups based on a cutoff (usually 10 counts). It assumes that the presence of abundant species is certain, while the true number of rare species must be estimated.

Equations:

$$C_{ace} = 1 - \frac{F_1}{X_{rare}}$$

$$\gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]$$

$$D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2$$

Where:

  • \(r\) : Rare cutoff (default 10). Features with \(\le r\) counts are considered rare.

  • \(F_i\) : Number of features with exactly \(i\) counts.

  • \(F_1\) : Number of features where \(X_i = 1\) (singletons).

  • \(F_{rare}\) : Number of rare features where \(X_i \le r\).

  • \(F_{abund}\) : Number of abundant features where \(X_i > r\).

  • \(X_{rare}\) : Total counts belonging to rare features.

  • \(C_{ace}\) : The sample abundance coverage estimator.

  • \(\gamma_{ace}^2\) : The estimated coefficient of variation.

Parameter: cutoff The integer threshold distinguishing rare from abundant species. Standard practice is to use 10.

Input Types

The counts parameter is designed to accept a simple numeric matrix, but seamlessly supports objects from the following biological data packages:

  • phyloseq

  • rbiom

  • SummarizedExperiment

  • TreeSummarizedExperiment

For large datasets, standard matrix operations may be slow. See vignette('performance') for details on using optimized formats (e.g. sparse matrices) and parallel processing.

References

Chao, A., & Lee, S. M. (1992). Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87(417), 210-217. doi:10.1080/01621459.1992.10475194

Examples

    ace(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    5.0    8.9    6.0    NaN