Estimates the optimal sequencing depth for each sample in a matrix by leveraging the global abundance distribution structure.
Arguments
- biom
An rbiom object, or any value accepted by
as_rbiom().- adjust
Numeric. Bandwidth adjustment for the kernel density estimation. Default:
1.5.
The Singleton Peak Heuristic
When depth = NULL, biom_inflate() calls this function to estimate the original
sequencing depth for each sample. The underlying assumption is that in
typical microbiome datasets, the most frequent count value (the mode of the
abundance distribution) is 1 (a singleton).
The algorithm works as follows:
Log-Transformation: Non-zero relative abundances are log10-transformed.
Global Consensus: To overcome sparsity in individual samples, distributions are centered by their medians and aggregated across all samples.
Peak Detection: Kernel Density Estimation (KDE) is used to identify the peak (mode) of this aggregated distribution.
Scaling: A scaling factor is calculated for each sample that shifts this peak to correspond to an integer count of 1.
This approach effectively "shoehorns" relative abundance data into integer formats required by diversity metrics (like rarefaction or Chao1) by maximizing the number of singletons in the resulting matrix.
See also
biom_inflate() which uses this heuristic when depth = NULL.