Visualize taxa abundance with scatterplots and trendlines.

Usage

taxa_corrplot(
  biom,
  x,
  rank = -1,
  layers = "tc",
  taxa = 6,
  lineage = FALSE,
  unc = "singly",
  other = FALSE,
  stat.by = NULL,
  facet.by = NULL,
  colors = TRUE,
  shapes = TRUE,
  test = "emmeans",
  fit = "gam",
  at = NULL,
  level = 0.95,
  p.adj = "fdr",
  transform = "none",
  ties = "random",
  seed = 0,
  alt = "!=",
  mu = 0,
  caption = TRUE,
  check = FALSE,
  ...
)

Arguments

biom

An rbiom object, such as from as_rbiom(). Any value accepted by as_rbiom() can also be given here.

x

Dataset field with the x-axis values. Equivalent to the regr argument in stats_table(). Required.

rank

What rank(s) of taxa to display. E.g. "Phylum", "Genus", ".otu", etc. An integer vector can also be given, where 1 is the highest rank, 2 is the second highest, -1 is the lowest rank, -2 is the second lowest, and 0 is the OTU "rank". Run biom$ranks to see all options for a given rbiom object. Default: -1.

layers

One or more of c("trend", "confidence", "point", "name", "residual"). Single letter abbreviations are also accepted. For instance, c("trend", "point") is equivalent to c("t", "p") and "tp". Default: "tc"

taxa

Which taxa to display. An integer value will show the top n most abundant taxa. A value 0 <= n < 1 will show any taxa with that mean abundance or greater (e.g. 0.1 implies >= 10%). A character vector of taxa names will show only those named taxa. Default: 6.

lineage

Include all ranks in the name of the taxa. For instance, setting to TRUE will produce Bacteria; Actinobacteria; Coriobacteriia; Coriobacteriales. Otherwise the taxa name will simply be Coriobacteriales. You want to set this to TRUE when unc = "asis" and you have taxa names (such as Incertae_Sedis) that map to multiple higher level ranks. Default: FALSE

unc

How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:

"singly" -: Replaces them with the OTU name.
"grouped" -: Replaces them with a higher rank's name.
"drop" -: Excludes them from the result.
"asis" -: To not check/modify any taxa names.

Abbreviations are allowed. Default: "singly"

other

Sum all non-itemized taxa into an "Other" taxa. When FALSE, only returns taxa matched by the taxa argument. Specifying TRUE adds "Other" to the returned set. A string can also be given to imply TRUE, but with that value as the name to use instead of "Other". Default: FALSE

stat.by

Dataset field with the statistical groups. Must be categorical. Default: NULL

facet.by

Dataset field(s) to use for faceting. Must be categorical. Default: NULL

colors

How to color the groups. Options are:

TRUE -: Automatically select colorblind-friendly colors.
FALSE or NULL -: Don't use colors.
a palette name -: Auto-select colors from this set. E.g. "okabe"
character vector -: Custom colors to use. E.g. c("red", "#00FF00")
named character vector -: Explicit mapping. E.g. c(Male = "blue", Female = "red")

See "Aesthetics" section below for additional information. Default: TRUE

shapes

Shapes for each group. Options are similar to colors's: TRUE, FALSE, NULL, shape names (typically integers 0 - 17), or a named vector mapping groups to specific shape names. See "Aesthetics" section below for additional information. Default: TRUE

test

Method for computing p-values: 'none', 'emmeans', or 'emtrends'. Default: 'emmeans'

fit

How to fit the trendline. 'lm', 'log', or 'gam'. Default: 'gam'

at

Position(s) along the x-axis where the means or slopes should be evaluated. Default: NULL, which samples 100 evenly spaced positions and selects the position where the p-value is most significant.

level

The confidence level for calculating a confidence interval. Default: 0.95

p.adj

Method to use for multiple comparisons adjustment of p-values. Run p.adjust.methods for a list of available options. Default: "fdr"

transform

Transformation to apply. Options are: c("none", "rank", "log", "log1p", "sqrt", "percent"). "rank" is useful for correcting for non-normally distributions before applying regression statistics. Default: "none"

ties

When transform="rank", how to rank identical values. Options are: c("average", "first", "last", "random", "max", "min"). See rank() for details. Default: "random"

seed

Random seed for permutations. Must be a non-negative integer. Default: 0

alt

Alternative hypothesis direction. Options are '!=' (two-sided; not equal to mu), '<' (less than mu), or '>' (greater than mu). Default: '!='

mu

Reference value to test against. Default: 0

caption

Add methodology caption beneath the plot. Default: TRUE

check

Generate additional plots to aid in assessing data normality. Default: FALSE

...

Additional parameters to pass along to ggplot2 functions. Prefix a parameter name with a layer name to pass it to only that layer. For instance, p.size = 2 ensures only the points have their size set to 2.

Value

A ggplot2 plot. The computed data points, ggplot2 command, stats table, and stats table commands are available as $data, $code, $stats, and $stats$code, respectively.

Aesthetics

All built-in color palettes are colorblind-friendly. The available categorical palette names are: "okabe", "carto", "r4", "polychrome", "tol", "bright", "light", "muted", "vibrant", "tableau", "classic", "alphabet", "tableau20", "kelly", and "fishy".

Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.

Examples

    library(rbiom) 
    
    biom <- rarefy(subset(hmp50, `Body Site` %in% c('Buccal mucosa', 'Saliva')))
    taxa_corrplot(biom, x = "BMI", stat.by = "Body Site", taxa = 'Streptococcus')