Input Matrix
We will use the ex_counts dataset included with ecodive.
This feature table contains counts of bacterial genera across various
samples.
Alpha Diversity
Alpha diversity measures diversity within a single sample. In
ecodive, metrics are grouped into four categories based on
the aspect of diversity they quantify.
Richness Metrics
Richness metrics estimate the number of distinct features (e.g.,
genera) in a sample. The simplest metric, observed(),
counts features with non-zero abundance.
# Equivalent to rowSums(counts > 0)
observed(counts)
#> Saliva Gums Nose Stool
#> 4 3 4 5 The Chao1 estimator extends this by inferring the
number of unobserved, low-abundance features based on the ratio of
singletons (counts == 1) to doubletons
(counts == 2).
Diversity Metrics
Diversity metrics account for both richness and evenness (how equally abundances are distributed).
Simpson’s index is often used as a measure of evenness, representing the probability that two randomly selected individuals belong to different species.
# High Evenness (0.8) vs Low Evenness (0.07)
simpson(c(20, 20, 20, 20, 20))
#> [1] 0.8
simpson(c(100, 1, 1, 1, 1))
#> [1] 0.07507396
# Stool < Gums < Saliva < Nose
sort(simpson(counts))
#> Stool Gums Saliva Nose
#> 0.02302037 0.18806133 0.50725478 0.63539593 The Shannon diversity index (entropy) is another common metric that weights both richness and evenness.
Dominance Metrics
Dominance metrics focus on the abundance of the most common species. The Berger-Parker index is the proportional abundance of the single most abundant feature.
Phylogenetic Metrics
Phylogenetic metrics use a phylogenetic tree to incorporate evolutionary distance. Faith’s Phylogenetic Diversity (PD) calculates the total branch length spanned by the features present in a sample.
# ex_tree:
#
# +----------44---------- Haemophilus
# +-2-|
# | +----------------68---------------- Bacteroides
# |
# | +---18---- Streptococcus
# | +--12--|
# | | +--11-- Staphylococcus
# +--11--|
# | +-----24----- Corynebacterium
# +--12--|
# +--13-- Propionibacterium
faith(c(Propionibacterium = 1, Corynebacterium = 1), tree = ex_tree)
#> [1] 60
faith(c(Propionibacterium = 1, Haemophilus = 1), tree = ex_tree)
#> [1] 82
# Nose < Gums < Saliva < Stool
sort(faith(counts, tree = ex_tree))
#> Nose Gums Saliva Stool
#> 101 155 180 202 Formulas
Given:
- n : Number of features (e.g. species, OTUs, ASVs).
- X_i : Integer count of the i-th feature.
- X_T : Total of all counts (sequencing depth). X_T = \sum_{i=1}^{n} X_i
- P_i : Proportional abundance of the i-th feature. P_i = X_i / X_T
- F_1 : Number of singletons (X_i = 1).
- F_2 : Number of doubletons (X_i = 2).
| Metric | Formula |
|---|---|
| Abundance-based Coverage Estimator (ACE) | See below. |
| Berger-Parker Index | \max(P_i) |
| Brillouin Index | \displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i} |
| Chao1 | \displaystyle n + \frac{(F_1)^2}{2 F_2} |
| Faith’s Phylogenetic Diversity | See below. |
| Fisher’s Alpha (\alpha) |
\displaystyle
\frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)} (\alpha is solved for iteratively) |
| Gini-Simpson Index | 1 - \sum_{i = 1}^{n} P_i^2 |
| Inverse Simpson Index | 1 / \sum_{i = 1}^{n} P_i^2 |
| Margalef’s Richness Index | \displaystyle \frac{n - 1}{\ln{X_T}} |
| McIntosh Index | \displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}} |
| Menhinick’s Richness Index | \displaystyle \frac{n}{\sqrt{X_T}} |
| Observed Features | n |
| Shannon Diversity Index | -\sum_{i = 1}^{n} P_i \times \ln(P_i) |
| Squares Richness Estimator | \displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1} |
Abundance-based Coverage Estimator (ACE)
Given:
- r : Rare cutoff (features with \le r counts are considered rare).
- F_{rare} : Number of rare features.
- F_{abund} : Number of abundant features (> r counts).
- X_{rare} : Total counts belonging to rare features.
- C_{ace} : Sample abundance coverage estimator.
- \gamma_{ace}^2 : Estimated coefficient of variation.
C_{ace} = 1 - \frac{F_1}{X_{rare}}
\gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]
D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2
