Skip to contents

Distance / dissimilarity between samples.

Usage

bdiv_table(
  biom,
  bdiv = "Bray-Curtis",
  weighted = TRUE,
  tree = NULL,
  md = ".all",
  within = NULL,
  between = NULL,
  delta = ".all",
  transform = "none",
  ties = "random",
  seed = 0
)

bdiv_matrix(
  biom,
  bdiv = "Bray-Curtis",
  weighted = TRUE,
  tree = NULL,
  within = NULL,
  between = NULL,
  transform = "none",
  ties = "random",
  seed = 0
)

bdiv_distmat(
  biom,
  bdiv = "Bray-Curtis",
  weighted = TRUE,
  tree = NULL,
  within = NULL,
  between = NULL,
  transform = "none"
)

Arguments

biom

An rbiom object, such as from as_rbiom(). Any value accepted by as_rbiom() can also be given here.

bdiv

Beta diversity distance algorithm(s) to use. Options are: "Bray-Curtis", "Manhattan", "Euclidean", "Jaccard", and "UniFrac". For "UniFrac", a phylogenetic tree must be present in biom or explicitly provided via tree=. Default: "Bray-Curtis"

Multiple/abbreviated values allowed.

weighted

Take relative abundances into account. When weighted=FALSE, only presence/absence is considered. Default: TRUE

Multiple values allowed.

tree

A phylo object representing the phylogenetic relationships of the taxa in biom. Only required when computing UniFrac distances. Default: biom$tree

md

Dataset field(s) to include in the output data frame, or '.all' to include all metadata fields. Default: '.all'

within, between

Dataset field(s) for intra- or inter- sample comparisons. Alternatively, dataset field names given elsewhere can be prefixed with '==' or '!=' to assign them to within or between, respectively. Default: NULL

delta

For numeric metadata, report the absolute difference in values for the two samples, for instance 2 instead of "10 vs 12". Default: TRUE

transform

Transformation to apply. Options are: c("none", "rank", "log", "log1p", "sqrt", "percent"). "rank" is useful for correcting for non-normally distributions before applying regression statistics. Default: "none"

ties

When transform="rank", how to rank identical values. Options are: c("average", "first", "last", "random", "max", "min"). See rank() for details. Default: "random"

seed

Random seed for permutations. Default: 0

Value

bdiv_matrix() -

An R matrix of samples x samples.

bdiv_distmat() -

A dist-class distance matrix.

bdiv_table() -

A tibble data.frame with columns names .sample1, .sample2, .weighted, .bdiv, .distance, and any fields requested by md. Numeric metadata fields will be returned as abs(x - y); categorical metadata fields as "x", "y", or "x vs y".

Metadata Comparisons

Prefix metadata fields with == or != to limit comparisons to within or between groups, respectively. For example, stat.by = '==Sex' will run calculations only for intra-group comparisons, returning "Male" and "Female", but NOT "Female vs Male". Similarly, setting stat.by = '!=Body Site' will only show the inter-group comparisons, such as "Saliva vs Stool", "Anterior nares vs Buccal mucosa", and so on.

The same effect can be achieved by using the within and between parameters. stat.by = '==Sex' is equivalent to stat.by = 'Sex', within = 'Sex'.

Examples

    library(rbiom)
    
    # Subset to four samples
    biom <- hmp50$clone()
    biom$counts <- biom$counts[,c("HMP18", "HMP19", "HMP20", "HMP21")]
    
    # Return in long format with metadata
    bdiv_table(biom, 'unifrac', md = ".all")
#> # A tibble: 6 × 9
#>   .sample1 .sample2 .weighted .bdiv   .distance   Age   BMI `Body Site`    Sex  
#>   <chr>    <chr>    <lgl>     <fct>       <dbl> <dbl> <dbl> <fct>          <fct>
#> 1 HMP18    HMP19    TRUE      UniFrac     0.665     0     3 Saliva vs Sto… Fema…
#> 2 HMP18    HMP20    TRUE      UniFrac     0.681     1     2 Saliva vs Sto… Fema…
#> 3 HMP19    HMP20    TRUE      UniFrac     0.418     1     5 Stool          Fema…
#> 4 HMP18    HMP21    TRUE      UniFrac     0.717     4     2 Saliva vs Sto… Male 
#> 5 HMP19    HMP21    TRUE      UniFrac     0.390     4     1 Stool          Fema…
#> 6 HMP20    HMP21    TRUE      UniFrac     0.149     5     4 Stool          Fema…
    
    # Only look at distances among the stool samples
    bdiv_table(biom, 'unifrac', md = c("==Body Site", "Sex"))
#> # A tibble: 3 × 7
#>   .sample1 .sample2 .weighted .bdiv   .distance `Body Site` Sex           
#>   <chr>    <chr>    <lgl>     <fct>       <dbl> <fct>       <fct>         
#> 1 HMP19    HMP20    TRUE      UniFrac     0.418 Stool       Female        
#> 2 HMP19    HMP21    TRUE      UniFrac     0.390 Stool       Female vs Male
#> 3 HMP20    HMP21    TRUE      UniFrac     0.149 Stool       Female vs Male
    
    # Or between males and females
    bdiv_table(biom, 'unifrac', md = c("Body Site", "!=Sex"))
#> # A tibble: 4 × 7
#>   .sample1 .sample2 .weighted .bdiv   .distance `Body Site`     Sex           
#>   <chr>    <chr>    <lgl>     <fct>       <dbl> <fct>           <fct>         
#> 1 HMP18    HMP19    TRUE      UniFrac     0.665 Saliva vs Stool Female vs Male
#> 2 HMP18    HMP20    TRUE      UniFrac     0.681 Saliva vs Stool Female vs Male
#> 3 HMP19    HMP21    TRUE      UniFrac     0.390 Stool           Female vs Male
#> 4 HMP20    HMP21    TRUE      UniFrac     0.149 Stool           Female vs Male
    
    # All-vs-all matrix
    bdiv_matrix(biom, 'unifrac')
#>           HMP18     HMP19     HMP20     HMP21
#> HMP18 0.0000000 0.6651627 0.6810017 0.7170374
#> HMP19 0.6651627 0.0000000 0.4183059 0.3896741
#> HMP20 0.6810017 0.4183059 0.0000000 0.1490926
#> HMP21 0.7170374 0.3896741 0.1490926 0.0000000
#> attr(,"cmd")
#> [1] "bdiv_matrix(biom, \"unifrac\")"
    
    # All-vs-all distance matrix
    dm <- bdiv_distmat(biom, 'unifrac')
    dm
#>           HMP18     HMP19     HMP20
#> HMP19 0.6651627                    
#> HMP20 0.6810017 0.4183059          
#> HMP21 0.7170374 0.3896741 0.1490926
    plot(hclust(dm))