Skip to contents

Beta Diversity Wrapper Function

Usage

beta_div(
  counts,
  metric,
  norm = "percent",
  power = 1.5,
  pseudocount = NULL,
  alpha = 0.5,
  tree = NULL,
  pairs = NULL,
  margin = 1L,
  cpus = n_cpus()
)

Arguments

counts

A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects. For optimal performance with very large datasets, see the guide in vignette('performance').

metric

The name of a beta diversity metric. One of c('aitchison', 'bhattacharyya', 'bray', 'canberra', 'chebyshev', 'chord', 'clark', 'divergence', 'euclidean', 'generalized_unifrac', 'gower', 'hamming', 'hellinger', 'horn', 'jaccard', 'jensen', 'jsd', 'lorentzian', 'manhattan', 'matusita', 'minkowski', 'morisita', 'motyka', 'normalized_unifrac', 'ochiai', 'psym_chisq', 'soergel', 'sorensen', 'squared_chisq', 'squared_chord', 'squared_euclidean', 'topsoe', 'unweighted_unifrac', 'variance_adjusted_unifrac', 'wave_hedges', 'weighted_unifrac'). Flexible matching is supported (see below). Programmatic access via list_metrics('beta').

norm

Normalize the incoming counts. Options are:

norm = "percent" -

Relative abundance (sample abundances sum to 1).

norm = "binary" -

Unweighted presence/absence (each count is either 0 or 1).

norm = "clr" -

Centered log ratio.

norm = "none" -

No transformation.

Default: 'percent', which is the expected input for these formulas.

power

Scaling factor for the magnitude of differences between communities (\(p\)). Default: 1.5

pseudocount

The value to add to all counts in counts to prevent taking log(0) for unobserved features. The default, NULL, selects the smallest non-zero value in counts.

alpha

How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting alpha=1 is equivalent to normalized_unifrac().

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

pairs

Which combinations of samples should distances be calculated for? The default value (NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.

margin

If your samples are in the matrix's rows, set to 1L. If your samples are in columns, set to 2L. Ignored when counts is a phyloseq, rbiom, SummarizedExperiment, or TreeSummarizedExperiment object. Default: 1L

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

A numeric vector.

Details

List of Beta Diversity Metrics

Option / Function NameMetric Name
aitchisonAitchison distance
bhattacharyyaBhattacharyya distance
brayBray-Curtis dissimilarity
canberraCanberra distance
chebyshevChebyshev distance
chordChord distance
clarkClark's divergence distance
divergenceDivergence
euclideanEuclidean distance
generalized_unifracGeneralized UniFrac (GUniFrac)
gowerGower distance
hammingHamming distance
hellingerHellinger distance
hornHorn-Morisita dissimilarity
jaccardJaccard distance
jensenJensen-Shannon distance
jsdJesen-Shannon divergence (JSD)
lorentzianLorentzian distance
manhattanManhattan distance
matusitaMatusita distance
minkowskiMinkowski distance
morisitaMorisita dissimilarity
motykaMotyka dissimilarity
normalized_unifracNormalized Weighted UniFrac
ochiaiOtsuka-Ochiai dissimilarity
psym_chisqProbabilistic Symmetric Chi-Squared distance
soergelSoergel distance
sorensenDice-Sorensen dissimilarity
squared_chisqSquared Chi-Squared distance
squared_chordSquared Chord distance
squared_euclideanSquared Euclidean distance
topsoeTopsoe distance
unweighted_unifracUnweighted UniFrac
variance_adjusted_unifracVariance-Adjusted Weighted UniFrac (VAW-UniFrac)
wave_hedgesWave Hedges distance
weighted_unifracWeighted UniFrac

Flexible name matching

Case insensitive and partial matching. Any runs of non-alpha characters are converted to underscores. E.g. metric = 'Weighted UniFrac selects weighted_unifrac.

UniFrac names can be shortened to the first letter plus "unifrac". E.g. uunifrac, w_unifrac, or V UniFrac. These also support partial matching.

Finished code should always use the full primary option name to avoid ambiguity with future additions to the metrics list.

Examples

    # Example counts matrix
    ex_counts
#>        Streptococcus Bacteroides Corynebacterium Haemophilus Propionibacterium
#> Saliva           162           2               0         180                 1
#> Gums             793           4               0          87                 1
#> Nose              22           2             498           2               251
#> Stool              1         611               1           1                 0
#>        Staphylococcus
#> Saliva              0
#> Gums                1
#> Nose              236
#> Stool               1
    
    # Bray-Curtis distances
    beta_div(ex_counts, 'bray')
#>          Saliva      Gums      Nose
#> Gums  0.4265973                    
#> Nose  0.9713843 0.9720256          
#> Stool 0.9909509 0.9911046 0.9915177
    
    # Generalized UniFrac distances
    beta_div(ex_counts, 'GUniFrac', tree = ex_tree)
#>          Saliva      Gums      Nose
#> Gums  0.4471644                    
#> Nose  0.8215129 0.7607876          
#> Stool 0.9727827 0.9784242 0.9730332