Alpha Diversity Metrics
Usage
ace(counts, cutoff = 10, cpus = n_cpus())
berger(counts, rescale = TRUE, cpus = n_cpus())
brillouin(counts, cpus = n_cpus())
chao1(counts, cpus = n_cpus())
faith(counts, tree = NULL, cpus = n_cpus())
fisher(counts, digits = 3L, cpus = n_cpus())
inv_simpson(counts, rescale = TRUE, cpus = n_cpus())
margalef(counts, cpus = n_cpus())
mcintosh(counts, cpus = n_cpus())
menhinick(counts, cpus = n_cpus())
observed(counts, cpus = n_cpus())
shannon(counts, rescale = TRUE, cpus = n_cpus())
simpson(counts, rescale = TRUE, cpus = n_cpus())
squares(counts, cpus = n_cpus())
Arguments
- counts
A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with
as.matrix()
can be given here, as well asphyloseq
,rbiom
,SummarizedExperiment
, andTreeSummarizedExperiment
objects.- cutoff
The maximum number of observations to consider "rare". Default:
10
.- cpus
How many parallel processing threads should be used. The default,
n_cpus()
, will use all logical CPU cores.- rescale
Normalize each sample's counts so they sum to
1
. Default:TRUE
- tree
A
phylo
-class object representing the phylogenetic tree for the OTUs incounts
. The OTU identifiers given bycolnames(counts)
must be present intree
. Can be omitted if a tree is embedded with thecounts
object or asattr(counts, 'tree')
.- digits
Precision of the returned values, in number of decimal places. E.g. the default
digits=3
could return6.392
.
Formulas
Prerequisite: all counts are whole numbers.
Given:
\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(X_i\) : Integer count of the \(i\)-th feature.
\(X_T\) : Total of all counts (i.e. sequencing depth). \(X_T = \sum_{i=1}^{n} X_i\)
\(P_i\) : Proportional abundance of the \(i\)-th feature. \(P_i = X_i / X_T\)
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_2\) : Number of features where \(X_i = 2\) (i.e. doubletons).
Abundance-based Coverage Estimator (ACE) ace() | See below. |
Berger-Parker Index berger() | \(\max(P_i)\) |
Brillouin Index brillouin() | \(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\) |
Chao1 chao1() | \(\displaystyle n + \frac{(F_1)^2}{2 F_2}\) |
Faith's Phylogenetic Diversity faith() | See below. |
Fisher's Alpha (\(\alpha\)) fisher() | \(\displaystyle \frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\) The value of \(\alpha\) must be solved for iteratively. |
Gini-Simpson Index simpson() | \(1 - \sum_{i = 1}^{n} P_i^2\) |
Inverse Simpson Index inv_simpson() | \(1 / \sum_{i = 1}^{n} P_i^2\) |
Margalef's Richness Index margalef() | \(\displaystyle \frac{n - 1}{\ln{X_T}}\) |
McIntosh Index mcintosh() | \(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\) |
Menhinick's Richness Index menhinick() | \(\displaystyle \frac{n}{\sqrt{X_T}}\) |
Observed Features observed() | \(n\) |
Shannon Diversity Index shannon() | \(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\) |
Squares Richness Estimator squares() | \(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\) |
Abundance-based Coverage Estimator (ACE)
Given:
\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(r\) : Rare cutoff. Features with \(\le r\) counts are considered rare.
\(X_i\) : Integer count of the \(i\)-th feature.
\(F_i\) : Number of features with exactly \(i\) counts.
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_{rare}\) : Number of rare features where \(X_i \le r\).
\(F_{abund}\) : Number of abundant features where \(X_i > r\).
\(X_{rare}\) : Total counts belonging to rare features.
\(C_{ace}\) : The sample abundance coverage estimator, defined below.
\(\gamma_{ace}^2\) : The estimated coefficient of variation, defined below.
\(D_{ace}\) : Estimated number of features in the sample.
\(\displaystyle C_{ace} = 1 - \frac{F_1}{X_{rare}}\)
\(\displaystyle \gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\)
\(\displaystyle D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2 \)
Examples
# Example counts matrix
t(ex_counts)
#> Saliva Gums Nose Stool
#> Streptococcus 162 793 22 1
#> Bacteroides 2 4 2 611
#> Corynebacterium 0 0 498 1
#> Haemophilus 180 87 2 1
#> Propionibacterium 1 1 251 0
#> Staphylococcus 0 1 236 1
ace(ex_counts)
#> Saliva Gums Nose Stool
#> 5.0 8.9 6.0 NaN
chao1(ex_counts)
#> Saliva Gums Nose Stool
#> 4.5 Inf 6.0 Inf
squares(ex_counts)
#> Saliva Gums Nose Stool
#> 4.492762 8.243044 6.000000 20.793551