Skip to contents

Performs missing data imputation on both exposure variables (from colData) and omics datasets (from experiments) within a MultiAssayExperiment object.

Usage

run_impute_missing(
  expomicset,
  exposure_impute_method = "median",
  exposure_cols = NULL,
  omics_impute_method = NULL,
  omics_to_impute = NULL
)

Arguments

expomicset

A MultiAssayExperiment object containing exposures and omics data.

exposure_impute_method

Character. Imputation method to use for exposure variables. Defaults to "median".

exposure_cols

Character vector. Names of columns in colData to impute. If NULL, all numeric columns are used.

omics_impute_method

Character. Imputation method to use for omics data. Defaults to "knn".

omics_to_impute

Character vector. Names of omics datasets to impute. If NULL, all omics datasets are included.

Value

A MultiAssayExperiment object with imputed exposure and/or omics data.

Details

For exposures, numeric columns in colData are imputed using the selected method. For omics data, assays are selected and imputed individually.

Supported imputation methods include:

  • "median": Median imputation using naniar::impute_median_all

  • "mean": Mean imputation using naniar::impute_mean_all

  • "knn": k-nearest neighbor imputation using impute::impute.knn

  • "mice": Multiple imputation using chained equations (mice::mice)

  • "dep": MinProb imputation for proteomics using DEP::impute

  • "missforest": Random forest-based imputation using missForest::missForest

  • "lod_sqrt2": Substitution of missing values with LOD/sqrt(2), where LOD is the smallest non-zero value per variable

Examples

if (FALSE) { # \dontrun{
imputed_mae <- run_impute_missing(my_mae,
                                  exposure_impute_method = "lod_sqrt2",
                                  exposure_cols = c("pm25", "no2"),
                                  omics_impute_method = "missforest",
                                  omics_to_impute = c("metabolomics", "proteomics"))
} # }