Skip to contents

Chao1 alpha diversity metric.

A non-parametric estimator of the number of unobserved species in a sample. The Chao1 index estimates total species richness based on the number of species that occur only once (singletons) and twice (doubletons) in the sample.

Usage

chao1(counts, cpus = n_cpus())

Arguments

counts

An OTU abundance matrix where each column is a sample, and each row is an OTU. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

A numeric vector.

Calculation

Prerequisite: all counts are whole numbers.

In the formulas below, x is a single column (sample) from counts. \(n\) is the total number of non-zero OTUs, \(a\) is the number of singletons, and \(b\) is the number of doubletons.

$$D = \displaystyle n + \frac{a^{2}}{2b}$$

  x <- c(1, 0, 3, 2, 6)
  sum(x>0) + (sum(x==1) ^ 2) / (2 * sum(x==2))
  #>  4.5

Note that when \(x\) does not have any singletons or doubletons (\(a = 0, b = 0\)), the result will be NaN. When \(x\) has singletons but no doubletons (\(a > 0, b = 0\)), the result will be Inf.

References

Chao A 1984. Non-parametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11:265-270.

See also

Other alpha_diversity: faith(), inv_simpson(), shannon(), simpson()

Examples

    # Example counts matrix
    ex_counts
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1
    
    # Chao1 diversity values
    chao1(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    4.5    Inf    6.0    Inf 
    
    # Low diversity
    chao1(c(100, 1, 1, 1, 1)) # Inf
#> [1] Inf
    
    # High diversity
    chao1(c(20, 20, 20, 20, 20)) # NaN
#> [1] NaN
    
    # Low richness
    chao1(1:3) # 3.5
#> [1] 3.5
    
    # High richness
    chao1(1:100) # 100.5
#> [1] 100.5