Skip to contents

A weighted UniFrac that adjusts for the expected variance of the metric.

Usage

variance_adjusted_unifrac(
  counts,
  tree = NULL,
  margin = 1L,
  pairs = NULL,
  cpus = n_cpus()
)

Arguments

counts

A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

margin

The margin containing samples. 1 if samples are rows, 2 if samples are columns. Ignored when counts is a special object class (e.g. phyloseq). Default: 1

pairs

Which combinations of samples should distances be calculated for? The default value (NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Details

The Variance-Adjusted Weighted UniFrac distance is defined as: $$\frac{\sum_{i=1}^{n} L_i\frac{|P_i - Q_i|}{\sqrt{(P_i + Q_i)(2 - P_i - Q_i)}} }{\sum_{i=1}^{n} L_i\frac{P_i + Q_i}{\sqrt{(P_i + Q_i)(2 - P_i - Q_i)}} }$$

Where:

  • \(n\) : The number of branches in the tree.

  • \(L_i\) : The length of the \(i\)-th branch.

  • \(P_i\), \(Q_i\) : The proportion of the community descending from branch \(i\) in sample P and Q.

Input Types

The counts parameter is designed to accept a simple numeric matrix, but seamlessly supports objects from the following biological data packages:

  • phyloseq

  • rbiom

  • SummarizedExperiment

  • TreeSummarizedExperiment

For large datasets, standard matrix operations may be slow. See vignette('performance') for details on using optimized formats (e.g. sparse matrices) and parallel processing.

References

Chang, Q., Luan, Y., & Sun, F. (2011). Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics, 12, 118. doi:10.1186/1471-2105-12-118

Examples

    variance_adjusted_unifrac(ex_counts, tree = ex_tree)
#>          Saliva      Gums      Nose
#> Gums  0.4242631                    
#> Nose  0.7753369 0.5565010          
#> Stool 0.9655749 0.9807634 0.9785147