Skip to contents

Performs hierarchical clustering of samples using exposure data from colData(expomicset).

Usage

run_cluster_samples(
  expomicset,
  exposure_cols = NULL,
  dist_method = NULL,
  user_k = NULL,
  cluster_method = "ward.D",
  clustering_approach = "diana",
  action = "add"
)

Arguments

expomicset

A MultiAssayExperiment object containing omics and exposure data.

exposure_cols

A character vector of column names in colData(expomicset) to use for clustering.

dist_method

A character string specifying the distance metric ("euclidean", "gower", etc.). If NULL, it is automatically determined.

user_k

An integer specifying the number of clusters. If NULL, an optimal k is determined.

cluster_method

A character string specifying the hierarchical clustering method. Default is "ward.D".

clustering_approach

A character string specifying the method for determining k ("diana", "gap", "elbow", "dynamic", or "density"). Default is "diana".

action

A character string specifying "add" (store results in metadata) or "get" (return clustering results). Default is "add".

Value

If action="add", returns the updated expomicset. If action="get", returns a list with:

sample_cluster

A hierarchical clustering object (hclust).

sample_groups

A named vector of sample cluster assignments.

heatmap

A ComplexHeatmap object visualizing sample clustering.

Details

This function:

  • Extracts numeric exposure data from colData(expomicset).

  • Computes a distance matrix ("gower" for mixed data, "euclidean" for numeric).

  • Determines the optimal number of clusters (k) using the specified method.

  • Performs hierarchical clustering (hclust) and assigns samples to clusters.

  • Generates a heatmap of scaled exposure values.

  • Stores results in metadata(expomicset)$sample_clustering when action="add".

Examples

if (FALSE) { # \dontrun{
expom <- run_cluster_samples(
  expomicset = expom,
  exposure_cols = c("PM2.5", "NO2"),
  dist_method = "gower",
  clustering_approach = "gap"
)
} # }