Gini-Simpson Index

The probability that two entities taken at random from the dataset represent different types.

Usage

simpson(counts, margin = 1L, cpus = n_cpus())

Arguments

counts: A numeric matrix of count data (samples $\times$ features). Typically contains absolute abundances (integer counts), though proportions are also accepted.
margin: The margin containing samples. 1 if samples are rows, 2 if samples are columns. Ignored when counts is a special object class (e.g. phyloseq). Default: 1
cpus: How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Details

The Gini-Simpson index is defined as: $$1 - \sum_{i = 1}^{n} P_i^2$$

Where:

$n$ : The number of features.
$P_i$ : Proportional abundance of the $i$-th feature.

Base R Equivalent:

x <- ex_counts[1,]
p <- x / sum(x)
1 - sum(p ** 2)

Input Types

The counts parameter is designed to accept a simple numeric matrix, but seamlessly supports objects from the following biological data packages:

phyloseq
rbiom
SummarizedExperiment
TreeSummarizedExperiment

For large datasets, standard matrix operations may be slow. See vignette('performance') for details on using optimized formats (e.g. sparse matrices) and parallel processing.

References

Simpson, E. H. (1949). Measurement of diversity. Nature, 163, 688. doi:10.1038/163688a0

Examples

    simpson(ex_counts)
#>     Saliva       Gums       Nose      Stool 
#> 0.50725478 0.18924937 0.64075388 0.01295525