Skip to contents

Cluster samples by beta diversity k-means.

Usage

bdiv_clusters(
  biom,
  bdiv = "bray",
  weighted = NULL,
  normalized = NULL,
  tree = NULL,
  k = 5,
  alpha = 0.5,
  cpus = NULL,
  ...
)

Arguments

biom

An rbiom object, or any value accepted by as_rbiom().

bdiv

Beta diversity distance algorithm(s) to use. Options are: c("aitchison", "bhattacharyya", "bray", "canberra", "chebyshev", "chord", "clark", "sorensen", "divergence", "euclidean", "generalized_unifrac", "gower", "hamming", "hellinger", "horn", "jaccard", "jensen", "jsd", "lorentzian", "manhattan", "matusita", "minkowski", "morisita", "motyka", "normalized_unifrac", "ochiai", "psym_chisq", "soergel", "squared_chisq", "squared_chord", "squared_euclidean", "topsoe", "unweighted_unifrac", "variance_adjusted_unifrac", "wave_hedges", "weighted_unifrac"). For the UniFrac family, a phylogenetic tree must be present in biom or explicitly provided via tree=. Supports partial matching. Multiple values are allowed for functions which return a table or plot. Default: "bray"

weighted

(Deprecated - weighting is now inherent in bdiv metric name.) Take relative abundances into account. When weighted=FALSE, only presence/absence is considered. Multiple values allowed. Default: NULL

normalized

(Deprecated - normalization is now inherent in bdiv metric name.) Only changes the "Weighted UniFrac" calculation. Divides result by the total branch weights. Default: NULL

tree

A phylo object representing the phylogenetic relationships of the taxa in biom. Only required when computing UniFrac distances. Default: biom$tree

k

Number of clusters. Default: 5L

alpha

The alpha term to use in Generalized UniFrac. How much weight to give to relative abundances; a value between 0 and 1, inclusive. Setting alpha=1 is equivalent to Normalized UniFrac. Default: 0.5

cpus

The number of CPUs to use. Set to NULL to use all available, or to 1 to disable parallel processing. Default: NULL

...

Passed on to stats::kmeans().

Value

A numeric factor assigning samples to clusters.

Examples

    library(rbiom)
    
    biom <- rarefy(hmp50)
    biom$metadata$bray_cluster <- bdiv_clusters(biom)
    
    pull(biom, 'bray_cluster')[1:10]
#> HMP01 HMP02 HMP03 HMP04 HMP05 HMP06 HMP07 HMP08 HMP09 HMP10 
#>     1     2     2     2     1     2     1     2     2     4 
#> Levels: 1 2 3 4 5
    
    bdiv_ord_plot(biom, stat.by = "bray_cluster")
#> Too few points to calculate an ellipse
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_path()`).