Generalized UniFrac beta diversity metric.
Usage
generalized_unifrac(
counts,
tree = NULL,
alpha = 0.5,
pairs = NULL,
cpus = n_cpus()
)
Arguments
- counts
An OTU abundance matrix where each column is a sample, and each row is an OTU. Any object coercible with
as.matrix()
can be given here, as well asphyloseq
,rbiom
,SummarizedExperiment
, andTreeSummarizedExperiment
objects.- tree
A
phylo
-class object representing the phylogenetic tree for the OTUs incounts
. The OTU identifiers given bycolnames(counts)
must be present intree
. Can be omitted if a tree is embedded with thecounts
object or asattr(counts, 'tree')
.- alpha
How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting
alpha=1
is equivalent toweighted_normalized_unifrac()
.- pairs
Which combinations of samples should distances be calculated for? The default value (
NULL
) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.- cpus
How many parallel processing threads should be used. The default,
n_cpus()
, will use all logical CPU cores.
Calculation
Given \(n\) branches with lengths \(L\), a pair of samples' abundances (\(A\) and \(B\)) on each of those branches, and abundance weighting \(0 \le \alpha \le 1\):
$$D = \displaystyle \frac{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})^{\alpha}|\displaystyle \frac{\frac{A_i}{A_T} - \frac{B_i}{B_T}}{\frac{A_i}{A_T} + \frac{B_i}{B_T}} |}{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})^{\alpha}}$$
See vignette('unifrac')
for details and a worked example.
References
Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li H 2012. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28(16). doi:10.1093/bioinformatics/bts342
See also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
#> Saliva Gums Nose Stool
#> Streptococcus 162 793 22 1
#> Bacteroides 2 4 2 611
#> Corynebacterium 0 0 498 1
#> Haemophilus 180 87 2 1
#> Propionibacterium 1 1 251 0
#> Staphylococcus 0 1 236 1
# Generalized UniFrac distance matrix
generalized_unifrac(ex_counts, tree = ex_tree)
#> Saliva Gums Nose
#> Gums 0.4471644
#> Nose 0.8215129 0.7607876
#> Stool 0.9727827 0.9784242 0.9730332
# Only calculate distances for A vs all.
generalized_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)
#> Saliva Gums Nose
#> Gums 0.4471644
#> Nose 0.8215129 NA
#> Stool 0.9727827 NA NA