Skip to contents

Distance / dissimilarity between samples.

Usage

bdiv_table(
  biom,
  bdiv = "Bray-Curtis",
  weighted = TRUE,
  normalized = TRUE,
  tree = NULL,
  md = ".all",
  within = NULL,
  between = NULL,
  delta = ".all",
  transform = "none",
  ties = "random",
  seed = 0,
  cpus = NULL
)

bdiv_matrix(
  biom,
  bdiv = "Bray-Curtis",
  weighted = TRUE,
  normalized = TRUE,
  tree = NULL,
  within = NULL,
  between = NULL,
  transform = "none",
  ties = "random",
  seed = 0,
  cpus = NULL
)

bdiv_distmat(
  biom,
  bdiv = "Bray-Curtis",
  weighted = TRUE,
  normalized = TRUE,
  tree = NULL,
  within = NULL,
  between = NULL,
  transform = "none",
  cpus = NULL
)

Arguments

biom

An rbiom object, such as from as_rbiom(). Any value accepted by as_rbiom() can also be given here.

bdiv

Beta diversity distance algorithm(s) to use. Options are: "Bray-Curtis", "Manhattan", "Euclidean", "Jaccard", and "UniFrac". For "UniFrac", a phylogenetic tree must be present in biom or explicitly provided via tree=. Multiple/abbreviated values allowed. Default: "Bray-Curtis"

weighted

Take relative abundances into account. When weighted=FALSE, only presence/absence is considered. Multiple values allowed. Default: TRUE

normalized

Only changes the "Weighted UniFrac" calculation. Divides result by the total branch weights. Default: TRUE

tree

A phylo object representing the phylogenetic relationships of the taxa in biom. Only required when computing UniFrac distances. Default: biom$tree

md

Dataset field(s) to include in the output data frame, or '.all' to include all metadata fields. Default: '.all'

within, between

Dataset field(s) for intra- or inter- sample comparisons. Alternatively, dataset field names given elsewhere can be prefixed with '==' or '!=' to assign them to within or between, respectively. Default: NULL

delta

For numeric metadata, report the absolute difference in values for the two samples, for instance 2 instead of "10 vs 12". Default: TRUE

transform

Transformation to apply. Options are: c("none", "rank", "log", "log1p", "sqrt", "percent"). "rank" is useful for correcting for non-normally distributions before applying regression statistics. Default: "none"

ties

When transform="rank", how to rank identical values. Options are: c("average", "first", "last", "random", "max", "min"). See rank() for details. Default: "random"

seed

Random seed for permutations. Must be a non-negative integer. Default: 0

cpus

The number of CPUs to use. Set to NULL to use all available, or to 1 to disable parallel processing. Default: NULL

Value

bdiv_matrix() -

An R matrix of samples x samples.

bdiv_distmat() -

A dist-class distance matrix.

bdiv_table() -

A tibble data.frame with columns names .sample1, .sample2, .weighted, .bdiv, .distance, and any fields requested by md. Numeric metadata fields will be returned as abs(x - y); categorical metadata fields as "x", "y", or "x vs y".

Metadata Comparisons

Prefix metadata fields with == or != to limit comparisons to within or between groups, respectively. For example, stat.by = '==Sex' will run calculations only for intra-group comparisons, returning "Male" and "Female", but NOT "Female vs Male". Similarly, setting stat.by = '!=Body Site' will only show the inter-group comparisons, such as "Saliva vs Stool", "Anterior nares vs Buccal mucosa", and so on.

The same effect can be achieved by using the within and between parameters. stat.by = '==Sex' is equivalent to stat.by = 'Sex', within = 'Sex'.

Examples

    library(rbiom)
    
    # Subset to four samples
    biom <- hmp50$clone()
    biom$counts <- biom$counts[,c("HMP18", "HMP19", "HMP20", "HMP21")]
    
    # Return in long format with metadata
    bdiv_table(biom, 'unifrac', md = ".all")
#> # A tibble: 6 × 9
#>   .sample1 .sample2 .weighted .bdiv   .distance   Age   BMI `Body Site`    Sex  
#>   <chr>    <chr>    <lgl>     <fct>       <dbl> <dbl> <dbl> <fct>          <fct>
#> 1 HMP18    HMP19    TRUE      UniFrac     0.735     0     3 Saliva vs Sto… Fema…
#> 2 HMP18    HMP20    TRUE      UniFrac     0.765     1     2 Saliva vs Sto… Fema…
#> 3 HMP18    HMP21    TRUE      UniFrac     0.771     4     2 Saliva vs Sto… Male 
#> 4 HMP19    HMP20    TRUE      UniFrac     0.433     1     5 Stool          Fema…
#> 5 HMP19    HMP21    TRUE      UniFrac     0.387     4     1 Stool          Fema…
#> 6 HMP20    HMP21    TRUE      UniFrac     0.150     5     4 Stool          Fema…
    
    # Only look at distances among the stool samples
    bdiv_table(biom, 'unifrac', md = c("==Body Site", "Sex"))
#> # A tibble: 3 × 7
#>   .sample1 .sample2 .weighted .bdiv   .distance `Body Site` Sex           
#>   <chr>    <chr>    <lgl>     <fct>       <dbl> <fct>       <fct>         
#> 1 HMP19    HMP20    TRUE      UniFrac     0.433 Stool       Female        
#> 2 HMP19    HMP21    TRUE      UniFrac     0.387 Stool       Female vs Male
#> 3 HMP20    HMP21    TRUE      UniFrac     0.150 Stool       Female vs Male
    
    # Or between males and females
    bdiv_table(biom, 'unifrac', md = c("Body Site", "!=Sex"))
#> # A tibble: 4 × 7
#>   .sample1 .sample2 .weighted .bdiv   .distance `Body Site`     Sex           
#>   <chr>    <chr>    <lgl>     <fct>       <dbl> <fct>           <fct>         
#> 1 HMP18    HMP19    TRUE      UniFrac     0.735 Saliva vs Stool Female vs Male
#> 2 HMP18    HMP20    TRUE      UniFrac     0.765 Saliva vs Stool Female vs Male
#> 3 HMP19    HMP21    TRUE      UniFrac     0.387 Stool           Female vs Male
#> 4 HMP20    HMP21    TRUE      UniFrac     0.150 Stool           Female vs Male
    
    # All-vs-all matrix
    bdiv_matrix(biom, 'unifrac')
#>           HMP18     HMP19     HMP20     HMP21
#> HMP18 0.0000000 0.7353910 0.7649262 0.7705909
#> HMP19 0.7353910 0.0000000 0.4332007 0.3874131
#> HMP20 0.7649262 0.4332007 0.0000000 0.1503528
#> HMP21 0.7705909 0.3874131 0.1503528 0.0000000
#> attr(,"cmd")
#> [1] "bdiv_matrix(biom, \"unifrac\")"
    
    # All-vs-all distance matrix
    dm <- bdiv_distmat(biom, 'unifrac')
    dm
#>           HMP18     HMP19     HMP20
#> HMP19 0.7353910                    
#> HMP20 0.7649262 0.4332007          
#> HMP21 0.7705909 0.3874131 0.1503528
    plot(hclust(dm))