Squares Richness Estimator

A richness estimator based on the concept of "squares" (counts of species observed once or twice).

Usage

squares(counts, margin = 1L, cpus = n_cpus())

Arguments

counts: A numeric matrix of count data (samples $\times$ features). Typically contains absolute abundances (integer counts), though proportions are also accepted.
margin: The margin containing samples. 1 if samples are rows, 2 if samples are columns. Ignored when counts is a special object class (e.g. phyloseq). Default: 1
cpus: How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Details

The Squares estimator is defined as: $$n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}$$

Where:

$n$ : The number of observed features.
$X_T$ : Total of all counts.
$F_1$ : Number of features observed once (singletons).
$X_i$ : Integer count of the $i$-th feature.

Base R Equivalent:

x  <- ex_counts[1,]
N  <- sum(x)      # sampling depth
S  <- sum(x > 0)  # observed features
F1 <- sum(x == 1) # singletons
S + ((sum(x^2) * (F1^2)) / ((N^2) - F1 * S))

Input Types

The counts parameter is designed to accept a simple numeric matrix, but seamlessly supports objects from the following biological data packages:

phyloseq
rbiom
SummarizedExperiment
TreeSummarizedExperiment

For large datasets, standard matrix operations may be slow. See vignette('performance') for details on using optimized formats (e.g. sparse matrices) and parallel processing.

References

Alroy, J. (2018). Limits to species richness estimates based on subsampling. Paleobiology, 44(2), 177-194. doi:10.1017/pab.2017.38

Examples

    squares(ex_counts)
#>    Saliva      Gums      Nose     Stool 
#>  4.492762  8.243044  6.000000 20.793551