A non-parametric estimator of species richness that separates features into abundant and rare groups.
Usage
ace(counts, cutoff = 10L, margin = 1L, cpus = n_cpus())Arguments
- counts
A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.
- cutoff
The maximum number of observations to consider "rare". Default:
10.- margin
The margin containing samples.
1if samples are rows,2if samples are columns. Ignored whencountsis a special object class (e.g.phyloseq). Default:1- cpus
How many parallel processing threads should be used. The default,
n_cpus(), will use all logical CPU cores.
Details
The ACE metric separates features into "abundant" and "rare" groups based on a cutoff (usually 10 counts). It assumes that the presence of abundant species is certain, while the true number of rare species must be estimated.
Equations:
$$C_{ace} = 1 - \frac{F_1}{X_{rare}}$$
$$\gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]$$
$$D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2$$
Where:
\(r\) : Rare cutoff (default 10). Features with \(\le r\) counts are considered rare.
\(F_i\) : Number of features with exactly \(i\) counts.
\(F_1\) : Number of features where \(X_i = 1\) (singletons).
\(F_{rare}\) : Number of rare features where \(X_i \le r\).
\(F_{abund}\) : Number of abundant features where \(X_i > r\).
\(X_{rare}\) : Total counts belonging to rare features.
\(C_{ace}\) : The sample abundance coverage estimator.
\(\gamma_{ace}^2\) : The estimated coefficient of variation.
Parameter: cutoff The integer threshold distinguishing rare from abundant species. Standard practice is to use 10.
Input Types
The counts parameter is designed to accept a simple numeric matrix, but
seamlessly supports objects from the following biological data packages:
phyloseqrbiomSummarizedExperimentTreeSummarizedExperiment
For large datasets, standard matrix operations may be slow. See
vignette('performance') for details on using optimized formats
(e.g. sparse matrices) and parallel processing.
References
Chao, A., & Lee, S. M. (1992). Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87(417), 210-217. doi:10.1080/01621459.1992.10475194
See also
Other Richness metrics:
chao1(),
margalef(),
menhinick(),
observed(),
squares()
