Skip to contents

Beta Diversity Wrapper Function

Usage

beta_div(
  counts,
  metric,
  norm = "percent",
  power = 1.5,
  pseudocount = NULL,
  alpha = 0.5,
  tree = NULL,
  pairs = NULL,
  cpus = n_cpus()
)

Arguments

counts

A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

metric

The name of a beta diversity metric. One of c('aitchison', 'bhattacharyya', 'bray', 'canberra', 'chebyshev', 'chord', 'clark', 'divergence', 'euclidean', 'generalized_unifrac', 'gower', 'hamming', 'hellinger', 'horn', 'jaccard', 'jensen', 'jsd', 'lorentzian', 'manhattan', 'matusita', 'minkowski', 'morisita', 'motyka', 'normalized_unifrac', 'ochiai', 'psym_chisq', 'soergel', 'sorensen', 'squared_chisq', 'squared_chord', 'squared_euclidean', 'topsoe', 'unweighted_unifrac', 'variance_adjusted_unifrac', 'wave_hedges', 'weighted_unifrac'). Flexible matching is supported (see below). Programmatic access via list_metrics('beta').

norm

Normalize the incoming counts. Options are:

norm = "percent" -

Relative abundance (sample abundances sum to 1).

norm = "binary" -

Unweighted presence/absence (each count is either 0 or 1).

norm = "clr" -

Centered log ratio.

norm = "none" -

No transformation.

Default: 'percent', which is the expected input for these formulas.

power

Scaling factor for the magnitude of differences between communities (\(p\)). Default: 1.5

pseudocount

The value to add to all counts in counts to prevent taking log(0) for unobserved features. The default, NULL, selects the smallest non-zero value in counts.

alpha

How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting alpha=1 is equivalent to normalized_unifrac().

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

pairs

Which combinations of samples should distances be calculated for? The default value (NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

A numeric vector.

Details

List of Beta Diversity Metrics

Option / Function NameMetric Name
aitchisonAitchison distance
bhattacharyyaBhattacharyya distance
brayBray-Curtis dissimilarity
canberraCanberra distance
chebyshevChebyshev distance
chordChord distance
clarkClark's divergence distance
divergenceDivergence
euclideanEuclidean distance
generalized_unifracGeneralized UniFrac (GUniFrac)
gowerGower distance
hammingHamming distance
hellingerHellinger distance
hornHorn-Morisita dissimilarity
jaccardJaccard distance
jensenJensen-Shannon distance
jsdJesen-Shannon divergence (JSD)
lorentzianLorentzian distance
manhattanManhattan distance
matusitaMatusita distance
minkowskiMinkowski distance
morisitaMorisita dissimilarity
motykaMotyka dissimilarity
normalized_unifracNormalized Weighted UniFrac
ochiaiOtsuka-Ochiai dissimilarity
psym_chisqProbabilistic Symmetric Chi-Squared distance
soergelSoergel distance
sorensenDice-Sorensen dissimilarity
squared_chisqSquared Chi-Squared distance
squared_chordSquared Chord distance
squared_euclideanSquared Euclidean distance
topsoeTopsoe distance
unweighted_unifracUnweighted UniFrac
variance_adjusted_unifracVariance-Adjusted Weighted UniFrac (VAW-UniFrac)
wave_hedgesWave Hedges distance
weighted_unifracWeighted UniFrac

Flexible name matching

Case insensitive and partial matching. Any runs of non-alpha characters are converted to underscores. E.g. metric = 'Weighted UniFrac selects weighted_unifrac.

UniFrac names can be shortened to the first letter plus "unifrac". E.g. uunifrac, w_unifrac, or V UniFrac. These also support partial matching.

Finished code should always use the full primary option name to avoid ambiguity with future additions to the metrics list.

Examples

    # Example counts matrix
    ex_counts
#>        Streptococcus Bacteroides Corynebacterium Haemophilus Propionibacterium
#> Saliva           162           2               0         180                 1
#> Gums             793           4               0          87                 1
#> Nose              22           2             498           2               251
#> Stool              1         611               1           1                 0
#>        Staphylococcus
#> Saliva              0
#> Gums                1
#> Nose              236
#> Stool               1
    
    # Bray-Curtis distances
    beta_div(ex_counts, 'bray')
#>          Saliva      Gums      Nose
#> Gums  0.4265973                    
#> Nose  0.9713843 0.9720256          
#> Stool 0.9909509 0.9911046 0.9915177
    
    # Generalized UniFrac distances
    beta_div(ex_counts, 'GUniFrac', tree = ex_tree)
#>          Saliva      Gums      Nose
#> Gums  0.4471644                    
#> Nose  0.8215129 0.7607876          
#> Stool 0.9727827 0.9784242 0.9730332