Input Matrix
Here we’ll use the ex_counts
feature table included with
ecodive. It contains the number of observations of each bacterial genera
in each sample. In the text below, you can substitute the word ‘genera’
for the feature of interest in your own data.
Beta Diversity
Beta diversity is a measure of how different two samples are.
Looking at the counts
matrix above, you can easily see
that saliva and gums are similar, while saliva and stool are different.
The different metrics described below quantify that difference, referred
to as the “distance” or “dissimilarity” between a pair of samples. The
distance is 0
for identical samples and 1
for
completely different samples.
Weighted vs Unweighted
The classic algorithms all run in weighted mode by default.
Specifying weighted = FALSE
,
e.g. canberra(counts, weighted = FALSE)
will switch them to
unweighted mode.
For the UniFrac algorithms, unweighted_unifrac()
is
unweighted and all the others are weighted.
Unweighted:
unweighted_unifrac()
Weighted:
weighted_unifrac()
,weighted_normalized_unifrac()
,generalized_unifrac()
,variance_adjusted_unifrac()
Partial Calculation
The default value of pairs=NULL
in ecodive’s beta
diversity functions results in the returned all-vs-all distance matrix
being completely filled in.
bray_curtis(counts)
#> Saliva Gums Nose
#> Gums 0.4260870
#> Nose 0.9797101 0.9826087
#> Stool 0.9884058 0.9884058 0.9913043
If you are doing a reference-vs-all comparison, you can use the
pairs
parameter to skip unwanted calculations and save some
CPU time. The larger the dataset, the more noticeable the improvement
will be.
bray_curtis(counts, pairs = 1:3)
#> Saliva Gums Nose
#> Gums 0.4260870
#> Nose 0.9797101 NA
#> Stool 0.9884058 NA NA
The pairs
argument can be:
- A numeric vector, giving the positions in the result to calculate.
- A logical vector, indicating whether to calculate a position in the result.
- A
function(i,j)
that returns whether columnsi
andj
should be compared.
Therefore, all of the following are equivalent:
bray_curtis(counts, pairs = 1:3)
bray_curtis(counts, pairs = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE))
bray_curtis(counts, pairs = function (i, j) i == 1)
The ordering of pairs
follows the pairings produced by
combn()
.
# Column index pairings
combn(ncol(counts), 2)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 1 1 2 2 3
#> [2,] 2 3 4 3 4 4
# Sample name pairings
combn(colnames(counts), 2)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] "Saliva" "Saliva" "Saliva" "Gums" "Gums" "Nose"
#> [2,] "Gums" "Nose" "Stool" "Nose" "Stool" "Stool"
So, for instance, to use gums as the reference sample:
my_combn <- combn(colnames(counts), 2)
my_pairs <- my_combn[1,] == 'Gums' | my_combn[2,] == 'Gums'
my_pairs
#> [1] TRUE FALSE FALSE TRUE TRUE FALSE
bray_curtis(counts, pairs = my_pairs)
#> Saliva Gums Nose
#> Gums 0.4260870
#> Nose NA 0.9826087
#> Stool NA 0.9884058 NA