Skip to contents

Simpson alpha diversity metric.

Usage

simpson(counts, cpus = n_cpus())

Arguments

counts

An OTU abundance matrix where each column is a sample, and each row is an OTU. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

A numeric vector.

Details

The Simpson index is a popular metric that incorporates both richness and evenness to describe community diversity. The most common version, the Gini-Simpson index (implemented here), measures the probability that two individuals selected randomly from the community will belong to different species. The value ranges from 0 to 1, where higher values indicate greater diversity. Because the calculation involves squaring the proportional abundances of each species, the index is heavily weighted by the most abundant (dominant) taxa and is less sensitive to the presence of rare species. A low Simpson index suggests that the community is dominated by one or a few species, making it a strong measure of community dominance.

Calculation

Pre-transformation: drop all OTUs with zero abundance.

In the formulas below, \(x\) is a single column (sample) from counts. \(p\) are the relative abundances.

$$p_{i} = \displaystyle \frac{x_i}{\sum x}$$ $$D = \displaystyle 1 - \sum_{i = 1}^{n} p_{i}\times\ln(p_{i})$$

  x <- c(4, 0, 3, 2, 6)[-2]
  p <- x / sum(x)
  1 - sum(p * log(p))
  #>  2.309526

References

Simpson EH 1949. Measurement of diversity. Nature, 163. doi:10.1038/163688a0

See also

Other alpha_diversity: alpha_div(), beta_div(), chao1(), faith(), inv_simpson(), observed(), shannon()

Examples

    # Example counts matrix
    ex_counts
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1
    
    # Simpson diversity values
    simpson(ex_counts)
#>     Saliva       Gums       Nose      Stool 
#> 0.50725478 0.18924937 0.64075388 0.01295525 
    
    # Low diversity
    simpson(c(100, 1, 1, 1, 1)) # 0.075
#> [1] 0.07507396
    
    # High diversity
    simpson(c(20, 20, 20, 20, 20)) # 0.8
#> [1] 0.8
    
    # Low richness
    simpson(1:3) # 0.61
#> [1] 0.6111111
    
    # High richness
    simpson(1:100) # 0.99
#> [1] 0.9867327