Skip to contents

Generalized UniFrac beta diversity metric.

Usage

generalized_unifrac(
  counts,
  tree = NULL,
  alpha = 0.5,
  pairs = NULL,
  cpus = n_cpus()
)

Arguments

counts

An OTU abundance matrix where each column is a sample, and each row is an OTU. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects.

tree

A phylo-class object representing the phylogenetic tree for the OTUs in counts. The OTU identifiers given by colnames(counts) must be present in tree. Can be omitted if a tree is embedded with the counts object or as attr(counts, 'tree').

alpha

How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting alpha=1 is equivalent to weighted_normalized_unifrac().

pairs

Which combinations of samples should distances be calculated for? The default value (NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.

cpus

How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Value

A dist object.

Calculation

Given \(n\) branches with lengths \(L\), a pair of samples' abundances (\(A\) and \(B\)) on each of those branches, and abundance weighting \(0 \le \alpha \le 1\):

$$D = \displaystyle \frac{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})^{\alpha}|\displaystyle \frac{\frac{A_i}{A_T} - \frac{B_i}{B_T}}{\frac{A_i}{A_T} + \frac{B_i}{B_T}} |}{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})^{\alpha}}$$

See vignette('unifrac') for details and a worked example.

References

Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li H 2012. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28(16). doi:10.1093/bioinformatics/bts342

Examples

    # Example counts matrix
    ex_counts
#>                   Saliva Gums Nose Stool
#> Streptococcus        162  793   22     1
#> Bacteroides            2    4    2   611
#> Corynebacterium        0    0  498     1
#> Haemophilus          180   87    2     1
#> Propionibacterium      1    1  251     0
#> Staphylococcus         0    1  236     1
    
    # Generalized UniFrac distance matrix
    generalized_unifrac(ex_counts, tree = ex_tree)
#>          Saliva      Gums      Nose
#> Gums  0.4471644                    
#> Nose  0.8215129 0.7607876          
#> Stool 0.9727827 0.9784242 0.9730332
    
    # Only calculate distances for A vs all.
    generalized_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)
#>          Saliva      Gums      Nose
#> Gums  0.4471644                    
#> Nose  0.8215129        NA          
#> Stool 0.9727827        NA        NA