A distance metric related to the Bhattacharyya distance, often used for community data with many zeros.
Usage
hellinger(counts, margin = 1L, pairs = NULL, cpus = n_cpus())Arguments
- counts
A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.
- margin
The margin containing samples.
1if samples are rows,2if samples are columns. Ignored whencountsis a special object class (e.g.phyloseq). Default:1- pairs
Which combinations of samples should distances be calculated for? The default value (
NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.- cpus
How many parallel processing threads should be used. The default,
n_cpus(), will use all logical CPU cores.
Details
The Hellinger distance is defined as: $$\sqrt{\sum_{i=1}^{n}(\sqrt{P_i} - \sqrt{Q_i})^{2}}$$
Where:
\(P_i\), \(Q_i\) : Proportional abundances of the \(i\)-th feature.
\(n\) : The number of features.
Base R Equivalent:
Input Types
The counts parameter is designed to accept a simple numeric matrix, but
seamlessly supports objects from the following biological data packages:
phyloseqrbiomSummarizedExperimentTreeSummarizedExperiment
For large datasets, standard matrix operations may be slow. See
vignette('performance') for details on using optimized formats
(e.g. sparse matrices) and parallel processing.
References
Rao, C. R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qüestiió, 19, 23-63.
Hellinger, E. (1909). Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. Journal für die reine und angewandte Mathematik, 136, 210–271. doi:10.1515/crll.1909.136.210
See also
beta_div(), vignette('bdiv'), vignette('bdiv_guide')
Other Abundance metrics:
aitchison(),
bhattacharyya(),
bray(),
canberra(),
chebyshev(),
chord(),
clark(),
divergence(),
euclidean(),
gower(),
horn(),
jensen(),
jsd(),
lorentzian(),
manhattan(),
matusita(),
minkowski(),
morisita(),
motyka(),
psym_chisq(),
soergel(),
squared_chisq(),
squared_chord(),
squared_euclidean(),
topsoe(),
wave_hedges()
