Alpha Diversity Metrics

Usage

ace(counts, cutoff = 10, cpus = n_cpus())

berger(counts, norm = "percent", cpus = n_cpus())

brillouin(counts, cpus = n_cpus())

chao1(counts, cpus = n_cpus())

faith(counts, tree = NULL, cpus = n_cpus())

fisher(counts, digits = 3L, cpus = n_cpus())

inv_simpson(counts, norm = "percent", cpus = n_cpus())

margalef(counts, cpus = n_cpus())

mcintosh(counts, cpus = n_cpus())

menhinick(counts, cpus = n_cpus())

observed(counts, cpus = n_cpus())

shannon(counts, norm = "percent", cpus = n_cpus())

simpson(counts, norm = "percent", cpus = n_cpus())

squares(counts, cpus = n_cpus())

Arguments

counts

A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

cutoff

The maximum number of observations to consider "rare". Default: 10.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

norm

Normalize the incoming counts. Options are:

norm = "percent" -: Relative abundance (sample abundances sum to 1).
norm = "binary" -: Unweighted presence/absence (each count is either 0 or 1).
norm = "clr" -: Centered log ratio.
norm = "none" -: No transformation.

Default: 'percent', which is the expected input for these formulas.

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

digits

Precision of the returned values, in number of decimal places. E.g. the default digits=3 could return 6.392.

Value

A numeric vector.

Formulas

Prerequisite: all counts are whole numbers.

Given:

\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(X_i\) : Integer count of the \(i\)-th feature.
\(X_T\) : Total of all counts (i.e. sequencing depth). \(X_T = \sum_{i=1}^{n} X_i\)
\(P_i\) : Proportional abundance of the \(i\)-th feature. \(P_i = X_i / X_T\)
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_2\) : Number of features where \(X_i = 2\) (i.e. doubletons).


Abundance-based Coverage Estimator (ACE) `ace()`	See below.
Berger-Parker Index `berger()`	\(\max(P_i)\)
Brillouin Index `brillouin()`	\(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\)
Chao1 `chao1()`	\(\displaystyle n + \frac{(F_1)^2}{2 F_2}\)
Faith's Phylogenetic Diversity `faith()`	See below.
Fisher's Alpha (\(\alpha\)) `fisher()`	\(\displaystyle \frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\) The value of \(\alpha\) must be solved for iteratively.
Gini-Simpson Index `simpson()`	\(1 - \sum_{i = 1}^{n} P_i^2\)
Inverse Simpson Index `inv_simpson()`	\(1 / \sum_{i = 1}^{n} P_i^2\)
Margalef's Richness Index `margalef()`	\(\displaystyle \frac{n - 1}{\ln{X_T}}\)
McIntosh Index `mcintosh()`	\(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\)
Menhinick's Richness Index `menhinick()`	\(\displaystyle \frac{n}{\sqrt{X_T}}\)
Observed Features `observed()`	\(n\)
Shannon Diversity Index `shannon()`	\(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\)
Squares Richness Estimator `squares()`	\(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\)

Abundance-based Coverage Estimator (ACE)

Given:

\(n\) : The number of features (e.g. species, OTUs, ASVs, etc).
\(r\) : Rare cutoff. Features with \(\le r\) counts are considered rare.
\(X_i\) : Integer count of the \(i\)-th feature.
\(F_i\) : Number of features with exactly \(i\) counts.
\(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).
\(F_{rare}\) : Number of rare features where \(X_i \le r\).
\(F_{abund}\) : Number of abundant features where \(X_i > r\).
\(X_{rare}\) : Total counts belonging to rare features.
\(C_{ace}\) : The sample abundance coverage estimator, defined below.
\(\gamma_{ace}^2\) : The estimated coefficient of variation, defined below.
\(D_{ace}\) : Estimated number of features in the sample.

\(\displaystyle C_{ace} = 1 - \frac{F_1}{X_{rare}}\)

\(\displaystyle \gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\)

\(\displaystyle D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2 \)

Faith's Phylogenetic Diversity (Faith's PD)

Given \(n\) branches with lengths \(L\) and a sample's abundances \(A\) on each of those branches coded as 1 for present or 0 for absent:

\(\sum_{i = 1}^{n} L_i A_i\)

Examples

    # Example counts matrix
    t(ex_counts)
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1
    
    ace(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    5.0    8.9    6.0    NaN 
    
    chao1(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    4.5    Inf    6.0    Inf 
    
    squares(ex_counts)
#>    Saliva      Gums      Nose     Stool 
#>  4.492762  8.243044  6.000000 20.793551