FAQ • tidyexposomics

What is the structure of the expomicset object and why use MultiAssayExperiment?

tidyexposomics wraps exposures and omics layers into a MultiAssayExperiment, allowing for synchronized sample metadata, flexible omics input, and pipeline tracking.

Exposure metadata lives in colData().
Omics assays live in experiments().
Row-level feature annotations live in rowData().
Results data live in metadata().

How are DEGs defined and filtered before correlation?

You control:

logfc_thresh (log fold change)
pval_thresh (adjusted p-value)
score_col (stability score)
score_thresh (stability filter if using sensitivity)

If you skip sensitivity, all DEGs can be used. But using stability scores makes your associations more robust by selecting features that are consistently significant across different pre-processing conditions.

How is run_sensitivity_analysis() different from just re-running DA?

sensitivity_analysis() performs systematic re-analysis under different pre-processing assumptions, capturing:

Different scaling methods
Count/proportion thresholds
Covariate inclusion/exclusion
Different sampling using bootstrapping

It calculates a stability score per feature to measure how robust significance is across conditions — not just significance in one model or set of preprocessing conditions.

What do exposome scores represent, and when should I use them?

Exposome scores summarize multi-exposure burden into one variable using strategies like:

median: Calculates the median of the exposure variables.
mean: Calculates the mean of the exposure variables.
sum: Calculates the sum of the exposure variables.
pca: Calculates the first principal component of the exposure variables.
irt: Uses Item Response Theory to calculate the exposome score.
quantile: Calculates the quantile of the exposure variables.

Use when:

You want dimensionality reduction
Or you believe co-exposure effects are more meaningful than single exposures

You can then associate them with outcomes using run_association().

How do I interpret enrichment results from deg_exp_cor?

Each enriched term represents a biological process or pathway that is:

Affected at the omics level (differential feature)
Correlated with an environmental exposure

This supports mechanistic interpretation of how exposures may impact disease-relevant pathways.

I see significant features with low stability — what does that mean?

If features pass significance filters but have low stability scores, they:

Might be highly sensitive to pre-processing choices
Are less reliable in real-world datasets

Use plot_sensitivity_summary() to visualize the trade-off between stability and significance.