Skip to contents

A non-parametric estimator of the lower bound of species richness.

Usage

chao1(counts, margin = 1L, cpus = n_cpus())

Arguments

counts

A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.

margin

The margin containing samples. 1 if samples are rows, 2 if samples are columns. Ignored when counts is a special object class (e.g. phyloseq). Default: 1

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Details

The Chao1 estimator uses the ratio of singletons to doubletons to estimate the number of missing species: $$n + \frac{(F_1)^2}{2 F_2}$$

Where:

  • \(n\) : The number of observed features.

  • \(F_1\) : Number of features observed once (singletons).

  • \(F_2\) : Number of features observed twice (doubletons).

Base R Equivalent:

x <- ex_counts[1,]
sum(x>0) + (sum(x == 1) ** 2) / (2 * sum(x == 2))

Input Types

The counts parameter is designed to accept a simple numeric matrix, but seamlessly supports objects from the following biological data packages:

  • phyloseq

  • rbiom

  • SummarizedExperiment

  • TreeSummarizedExperiment

For large datasets, standard matrix operations may be slow. See vignette('performance') for details on using optimized formats (e.g. sparse matrices) and parallel processing.

References

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11, 265-270.

Examples

    chao1(ex_counts)
#> Saliva   Gums   Nose  Stool 
#>    4.5    Inf    6.0    Inf