A unified UniFrac distance that balances the weight of abundant and rare lineages.
Usage
generalized_unifrac(
counts,
tree = NULL,
alpha = 0.5,
margin = 1L,
pairs = NULL,
cpus = n_cpus()
)Arguments
- counts
A numeric matrix of count data (samples \(\times\) features). Typically contains absolute abundances (integer counts), though proportions are also accepted.
- tree
A
phylo-class object representing the phylogenetic tree for the OTUs incounts. The OTU identifiers given bycolnames(counts)must be present intree. Can be omitted if a tree is embedded with thecountsobject or asattr(counts, 'tree').- alpha
How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting
alpha=1is equivalent tonormalized_unifrac().- margin
The margin containing samples.
1if samples are rows,2if samples are columns. Ignored whencountsis a special object class (e.g.phyloseq). Default:1- pairs
Which combinations of samples should distances be calculated for? The default value (
NULL) calculates all-vs-all. Provide a numeric or logical vector specifying positions in the distance matrix to calculate. See examples.- cpus
How many parallel processing threads should be used. The default,
n_cpus(), will use all logical CPU cores.
Details
The Generalized UniFrac distance is defined as: $$\frac{\sum_{i=1}^{n} L_i(P_i + Q_i)^{\alpha}\left|\frac{P_i - Q_i}{P_i + Q_i}\right|}{\sum_{i=1}^{n} L_i(P_i + Q_i)^{\alpha}}$$
Where:
\(n\) : The number of branches in the tree.
\(L_i\) : The length of the \(i\)-th branch.
\(P_i\), \(Q_i\) : The proportion of the community descending from branch \(i\) in sample P and Q.
\(\alpha\) : A scalable weighting factor.
Parameter: alpha
The alpha parameter controls the weight given to abundant lineages. \(\alpha = 1\) corresponds to Weighted UniFrac, while \(\alpha = 0\) corresponds to Unweighted UniFrac.
Input Types
The counts parameter is designed to accept a simple numeric matrix, but
seamlessly supports objects from the following biological data packages:
phyloseqrbiomSummarizedExperimentTreeSummarizedExperiment
For large datasets, standard matrix operations may be slow. See
vignette('performance') for details on using optimized formats
(e.g. sparse matrices) and parallel processing.
References
Chen, J., Bittinger, K., Charlson, E. S., Hoffmann, C., Lewis, J., Wu, G. D., ... & Li, H. (2012). Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28(16), 2106-2113. doi:10.1093/bioinformatics/bts342
See also
beta_div(), vignette('bdiv'), vignette('bdiv_guide')
Other Phylogenetic metrics:
faith(),
normalized_unifrac(),
unweighted_unifrac(),
variance_adjusted_unifrac(),
weighted_unifrac()
