A statistic used for comparing the similarity of two samples.
Usage
sorensen(counts, margin = 1L, pairs = NULL, cpus = n_cpus())Arguments
- counts
A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.
- margin
The margin containing samples.
1if samples are rows,2if samples are columns. Ignored whencountsis a special object class (e.g.phyloseq). Default:1- pairs
Which combinations of samples should distances be calculated for? The default value (
NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.- cpus
How many parallel processing threads should be used. The default,
n_cpus(), will use all logical CPU cores.
Details
The Dice-Sorensen dissimilarity is defined as: $$\frac{2J}{(A + B)}$$
Where:
\(A\), \(B\) : Number of features in each sample.
\(J\) : Number of features in common (intersection).
Base R Equivalent:
Input Types
The counts parameter is designed to accept a simple numeric matrix, but
seamlessly supports objects from the following biological data packages:
phyloseqrbiomSummarizedExperimentTreeSummarizedExperiment
For large datasets, standard matrix operations may be slow. See
vignette('performance') for details on using optimized formats
(e.g. sparse matrices) and parallel processing.
References
Sørensen, T. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter, 5, 1-34.
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302. doi:10.2307/1932409
See also
beta_div(), vignette('bdiv'), vignette('bdiv_guide')
Other Presence/Absence metrics:
hamming(),
jaccard(),
ochiai()
