Visualize BIOM data with boxplots.

Usage

taxa_boxplot(
  biom,
  x = NULL,
  rank = -1,
  layers = "x",
  taxa = 6,
  unc = "singly",
  other = FALSE,
  p.top = Inf,
  stat.by = x,
  facet.by = NULL,
  colors = TRUE,
  shapes = TRUE,
  patterns = FALSE,
  flip = FALSE,
  stripe = NULL,
  ci = "ci",
  level = 0.95,
  p.adj = "fdr",
  outliers = NULL,
  xlab.angle = "auto",
  p.label = 0.05,
  transform = "none",
  y.transform = "sqrt",
  caption = TRUE,
  ...
)

Arguments

biom

An rbiom object, or any value accepted by as_rbiom().

x

A categorical metadata column name to use for the x-axis. Or NULL, which puts taxa along the x-axis. Default: NULL

rank

What rank(s) of taxa to display. E.g. "Phylum", "Genus", ".otu", etc. An integer vector can also be given, where 1 is the highest rank, 2 is the second highest, -1 is the lowest rank, -2 is the second lowest, and 0 is the OTU "rank". Run biom$ranks to see all options for a given rbiom object. Default: -1.

layers

One or more of c("bar", "box" ("x"), "violin", "dot", "strip", "crossbar", "errorbar", "linerange", "pointrange"). Single letter abbreviations are also accepted. For instance, c("box", "dot") is equivalent to c("x", "d") and "xd". Default: "x"

taxa

Which taxa to display. An integer value will show the top n most abundant taxa. A value 0 <= n < 1 will show any taxa with that mean abundance or greater (e.g. 0.1 implies >= 10%). A character vector of taxa names will show only those named taxa. Default: 6.

unc

How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:

"singly" -: Replaces them with the OTU name.
"grouped" -: Replaces them with a higher rank's name.
"drop" -: Excludes them from the result.
"asis" -: To not check/modify any taxa names.

Abbreviations are allowed. Default: "singly"

other

Sum all non-itemized taxa into an "Other" taxa. When FALSE, only returns taxa matched by the taxa argument. Specifying TRUE adds "Other" to the returned set. A string can also be given to imply TRUE, but with that value as the name to use instead of "Other". Default: FALSE

p.top

Only display taxa with the most significant differences in abundance. If p.top is >= 1, then the p.top most significant taxa are displayed. If p.top is less than one, all taxa with an adjusted p-value <= p.top are displayed. Recommended to be used in combination with the taxa parameter to set a lower bound on the mean abundance of considered taxa. Default: Inf

stat.by

Dataset field with the statistical groups. Must be categorical. Default: NULL

facet.by

Dataset field(s) to use for faceting. Must be categorical. Default: NULL

colors

How to color the groups. Options are:

TRUE -: Automatically select colorblind-friendly colors.
FALSE or NULL -: Don't use colors.
a palette name -: Auto-select colors from this set. E.g. "okabe"
character vector -: Custom colors to use. E.g. c("red", "#00FF00")
named character vector -: Explicit mapping. E.g. c(Male = "blue", Female = "red")

See "Aesthetics" section below for additional information. Default: TRUE

shapes

Shapes for each group. Options are similar to colors's: TRUE, FALSE, NULL, shape names (typically integers 0 - 17), or a named vector mapping groups to specific shape names. See "Aesthetics" section below for additional information. Default: TRUE

patterns

Patterns for each group. Options are similar to colors's: TRUE, FALSE, NULL, pattern names ("brick", "chevron", "fish", "grid", etc), or a named vector mapping groups to specific pattern names. See "Aesthetics" section below for additional information. Default: FALSE

flip

Transpose the axes, so that taxa are present as rows instead of columns. Default: FALSE

stripe

Shade every other x position. Default: same as flip

ci

How to calculate min/max of the crossbar, errorbar, linerange, and pointrange layers. Options are: "ci" (confidence interval), "range", "sd" (standard deviation), "se" (standard error), and "mad" (median absolute deviation). The center mark of crossbar and pointrange represents the mean, except for "mad" in which case it represents the median. Default: "ci"

level

The confidence level for calculating a confidence interval. Default: 0.95

p.adj

Method to use for multiple comparisons adjustment of p-values. Run p.adjust.methods for a list of available options. Default: "fdr"

outliers

Show boxplot outliers? TRUE to always show. FALSE to always hide. NULL to only hide them when overlaying a dot or strip chart. Default: NULL

xlab.angle

Angle of the labels at the bottom of the plot. Options are "auto", '0', '30', and '90'. Default: "auto".

p.label

Minimum adjusted p-value to display on the plot with a bracket.

p.label = 0.05 -: Show p-values that are <= 0.05.
p.label = 0 -: Don't show any p-values on the plot.
p.label = 1 -: Show all p-values on the plot.

If a numeric vector with more than one value is provided, they will be used as breaks for asterisk notation. Default: 0.05

transform

Transformation to apply to calculated values. Options are: c("none", "rank", "log", "log1p", "sqrt", "percent"). "rank" is useful for correcting for non-normally distributions before applying regression statistics. Default: "none"

y.transform

The transformation to apply to the y-axis. Visualizing differences of both high- and low-abundance taxa is best done with a non-linear axis. Options are:

"sqrt" -: square-root transformation
"log1p" -: log(y + 1) transformation
"none" -: no transformation

These methods allow visualization of both high- and low-abundance taxa simultaneously, without complaint about 'zero' count observations. Default: "sqrt" Use xaxis.transform or yaxis.transform to pass custom values directly to ggplot2's scale_* functions.

caption

Add methodology caption beneath the plot. Default: TRUE

...

Additional parameters to pass along to ggplot2 functions. Prefix a parameter name with a layer name to pass it to only that layer. For instance, d.size = 2 ensures only the points on the dot layer have their size set to 2.

Value

A ggplot2 plot. The computed data points, ggplot2 command, stats table, and stats table commands are available as $data, $code, $stats, and $stats$code, respectively.

Aesthetics

All built-in color palettes are colorblind-friendly. The available categorical palette names are: "okabe", "carto", "r4", "polychrome", "tol", "bright", "light", "muted", "vibrant", "tableau", "classic", "alphabet", "tableau20", "kelly", and "fishy".

Patterns are added using the fillpattern R package. Options are "brick", "chevron", "fish", "grid", "herringbone", "hexagon", "octagon", "rain", "saw", "shingle", "rshingle", "stripe", and "wave", optionally abbreviated and/or suffixed with modifiers. For example, "hex10_sm" for the hexagon pattern rotated 10 degrees and shrunk by 2x. See fillpattern::fill_pattern() for complete documentation of options.

Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.

Examples

    library(rbiom)
    
    biom <- rarefy(hmp50)
    
    taxa_boxplot(biom, stat.by = "Body Site", stripe = TRUE)

    taxa_boxplot(biom, layers = "bed", rank = c("Phylum", "Genus"), flip = TRUE)

    taxa_boxplot(
      biom    = subset(biom, `Body Site` %in% c('Saliva', 'Stool')), 
      taxa    = 3, 
      layers  = "ps", 
      stat.by = "Body Site",
      colors  = c('Saliva' = "blue", 'Stool' = "red") )