Skip to contents

Shannon alpha diversity metric.

Usage

shannon(counts, cpus = n_cpus())

Arguments

counts

An OTU abundance matrix where each column is a sample, and each row is an OTU. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

A numeric vector.

Details

The Shannon index is a widely used metric that quantifies diversity by considering both the number of species (richness) and their abundance distribution (evenness). Borrowed from information theory, it measures the "uncertainty" or entropy in predicting the identity of a microbe drawn randomly from the sample. A community with many different species that are present in similar proportions will have high uncertainty and thus a high Shannon index value. Compared to the Simpson index, the Shannon index gives more equitable weight to both rare and abundant species, making it more sensitive to changes in richness. Higher values indicate greater community diversity.

Calculation

Pre-transformation: drop all OTUs with zero abundance.

In the formulas below, \(x\) is a single column (sample) from counts. \(p_i\) is the proportion of the \(i\)-th OTU in the total community.

$$p_{i} = \displaystyle \frac{x_i}{\sum x}$$ $$D = \displaystyle -\sum_{i = 1}^{n} p_{i}\times\ln(p_{i})$$

  x <- c(4, 0, 3, 2, 6)[-2]
  p <- x / sum(x)
  -sum(p * log(p))
  #>  1.309526

References

Shannon CE, Weaver W 1949. The Mathematical Theory of Communication. University of Illinois Press.

See also

Other alpha_diversity: alpha_div(), beta_div(), chao1(), faith(), inv_simpson(), observed(), simpson()

Examples

    # Example counts matrix
    ex_counts
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1
    
    # Shannon diversity values
    shannon(ex_counts)
#>     Saliva       Gums       Nose      Stool 
#> 0.74119910 0.36684449 1.14222899 0.04824952 
    
    # Low diversity
    shannon(c(100, 1, 1, 1, 1)) # 0.22
#> [1] 0.2163426
    
    # High diversity
    shannon(c(20, 20, 20, 20, 20)) # 1.61
#> [1] 1.609438
    
    # Low richness
    shannon(1:3) # 1.01
#> [1] 1.011404
    
    # High richness
    shannon(1:100) # 4.42
#> [1] 4.416898