Skip to contents

A distance metric related to the Bhattacharyya distance, often used for community data with many zeros.

Usage

hellinger(counts, margin = 1L, pairs = NULL, cpus = n_cpus())

Arguments

counts

A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.

margin

The margin containing samples. 1 if samples are rows, 2 if samples are columns. Ignored when counts is a special object class (e.g. phyloseq). Default: 1

pairs

Which combinations of samples should distances be calculated for? The default value (NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Details

The Hellinger distance is defined as: $$\sqrt{\sum_{i=1}^{n}(\sqrt{P_i} - \sqrt{Q_i})^{2}}$$

Where:

  • \(P_i\), \(Q_i\) : Proportional abundances of the \(i\)-th feature.

  • \(n\) : The number of features.

Base R Equivalent:

x <- ex_counts[1,]; p <- x / sum(x)
y <- ex_counts[2,]; q <- y / sum(y)
sqrt(sum((sqrt(p) - sqrt(q)) ^ 2))

Input Types

The counts parameter is designed to accept a simple numeric matrix, but seamlessly supports objects from the following biological data packages:

  • phyloseq

  • rbiom

  • SummarizedExperiment

  • TreeSummarizedExperiment

For large datasets, standard matrix operations may be slow. See vignette('performance') for details on using optimized formats (e.g. sparse matrices) and parallel processing.

References

Rao, C. R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qüestiió, 19, 23-63.

Hellinger, E. (1909). Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. Journal für die reine und angewandte Mathematik, 136, 210–271. doi:10.1515/crll.1909.136.210

Examples

    hellinger(ex_counts)
#>          Saliva      Gums      Nose
#> Gums  0.4867106                    
#> Nose  1.2935044 1.2732200          
#> Stool 1.3170808 1.3273191 1.3417468