Skip to contents

Alpha Diversity Metrics

Usage

ace(counts, cutoff = 10, cpus = n_cpus())

berger(counts, rescale = TRUE, cpus = n_cpus())

brillouin(counts, cpus = n_cpus())

chao1(counts, cpus = n_cpus())

faith(counts, tree = NULL, cpus = n_cpus())

fisher(counts, digits = 3L, cpus = n_cpus())

inv_simpson(counts, rescale = TRUE, cpus = n_cpus())

margalef(counts, cpus = n_cpus())

mcintosh(counts, cpus = n_cpus())

menhinick(counts, cpus = n_cpus())

observed(counts, cpus = n_cpus())

shannon(counts, rescale = TRUE, cpus = n_cpus())

simpson(counts, rescale = TRUE, cpus = n_cpus())

squares(counts, cpus = n_cpus())

Arguments

counts

A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

cutoff

The maximum number of observations to consider "rare". Default: 10.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

rescale

Normalize each sample's counts so they sum to 1. Default: TRUE

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

digits

Precision of the returned values, in number of decimal places. E.g. the default digits=3 could return 6.392.

Value

A numeric vector.

Formulas

Prerequisite: all counts are whole numbers.

Given:

  • \(n\) : The number of features (e.g. species, OTUs, ASVs, etc).

  • \(X_i\) : Integer count of the \(i\)-th feature.

  • \(X_T\) : Total of all counts (i.e. sequencing depth). \(X_T = \sum_{i=1}^{n} X_i\)

  • \(P_i\) : Proportional abundance of the \(i\)-th feature. \(P_i = X_i / X_T\)

  • \(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).

  • \(F_2\) : Number of features where \(X_i = 2\) (i.e. doubletons).

Abundance-based Coverage Estimator (ACE)
ace()
See below.
Berger-Parker Index
berger()
\(\max(P_i)\)
Brillouin Index
brillouin()
\(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\)
Chao1
chao1()
\(\displaystyle n + \frac{(F_1)^2}{2 F_2}\)
Faith's Phylogenetic Diversity
faith()
See below.
Fisher's Alpha (\(\alpha\))
fisher()
\(\displaystyle \frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\)
The value of \(\alpha\) must be solved for iteratively.
Gini-Simpson Index
simpson()
\(1 - \sum_{i = 1}^{n} P_i^2\)
Inverse Simpson Index
inv_simpson()
\(1 / \sum_{i = 1}^{n} P_i^2\)
Margalef's Richness Index
margalef()
\(\displaystyle \frac{n - 1}{\ln{X_T}}\)
McIntosh Index
mcintosh()
\(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\)
Menhinick's Richness Index
menhinick()
\(\displaystyle \frac{n}{\sqrt{X_T}}\)
Observed Features
observed()
\(n\)
Shannon Diversity Index
shannon()
\(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\)
Squares Richness Estimator
squares()
\(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\)

Abundance-based Coverage Estimator (ACE)

Given:

  • \(n\) : The number of features (e.g. species, OTUs, ASVs, etc).

  • \(r\) : Rare cutoff. Features with \(\le r\) counts are considered rare.

  • \(X_i\) : Integer count of the \(i\)-th feature.

  • \(F_i\) : Number of features with exactly \(i\) counts.

  • \(F_1\) : Number of features where \(X_i = 1\) (i.e. singletons).

  • \(F_{rare}\) : Number of rare features where \(X_i \le r\).

  • \(F_{abund}\) : Number of abundant features where \(X_i > r\).

  • \(X_{rare}\) : Total counts belonging to rare features.

  • \(C_{ace}\) : The sample abundance coverage estimator, defined below.

  • \(\gamma_{ace}^2\) : The estimated coefficient of variation, defined below.

  • \(D_{ace}\) : Estimated number of features in the sample.

\(\displaystyle C_{ace} = 1 - \frac{F_1}{X_{rare}}\)

\(\displaystyle \gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\)

\(\displaystyle D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2 \)

Faith's Phylogenetic Diversity (Faith's PD)

Given \(n\) branches with lengths \(L\) and a sample's abundances \(A\) on each of those branches coded as 1 for present or 0 for absent:

\(\sum_{i = 1}^{n} L_i A_i\)

Examples

    # Example counts matrix
    t(ex_counts)
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1
    
    ace(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    5.0    8.9    6.0    NaN 
    
    chao1(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    4.5    Inf    6.0    Inf 
    
    squares(ex_counts)
#>    Saliva      Gums      Nose     Stool 
#>  4.492762  8.243044  6.000000 20.793551