Performs hierarchical clustering of samples using exposure data from
colData(expomicset).
Usage
run_cluster_samples(
expomicset,
exposure_cols = NULL,
dist_method = NULL,
user_k = NULL,
cluster_method = "ward.D",
clustering_approach = "diana",
action = "add"
)Arguments
- expomicset
A
MultiAssayExperimentobject containing omics and exposure data.- exposure_cols
A character vector of column names in
colData(expomicset)to use for clustering.- dist_method
A character string specifying the distance metric (
"euclidean","gower", etc.). IfNULL, it is automatically determined.- user_k
An integer specifying the number of clusters. If
NULL, an optimalkis determined.- cluster_method
A character string specifying the hierarchical clustering method. Default is
"ward.D".- clustering_approach
A character string specifying the method for determining
k("diana","gap","elbow","dynamic", or"density"). Default is"diana".- action
A character string specifying
"add"(store results in metadata) or"get"(return clustering results). Default is"add".
Value
If action="add", returns the updated expomicset.
If action="get", returns a list with:
- sample_cluster
A hierarchical clustering object (
hclust).- sample_groups
A named vector of sample cluster assignments.
Details
This function:
Extracts numeric exposure data from
colData(expomicset).Computes a distance matrix (
"gower"for mixed data,"euclidean"for numeric).Determines the optimal number of clusters (
k) using the specified method.Performs hierarchical clustering (
hclust) and assigns samples to clusters.Generates a heatmap of scaled exposure values.
Stores results in
metadata(expomicset)$sample_clusteringwhenaction="add".
Examples
# create example data
mae <- make_example_data(
n_samples = 10,
return_mae = TRUE
)
#> Ensuring all omics datasets are matrices with column names.
#> Creating SummarizedExperiment objects.
#> Creating MultiAssayExperiment object.
#> MultiAssayExperiment created successfully.
# determine sample clusters
mae <- run_cluster_samples(
expomicset = mae,
exposure_cols = c("exposure_pm25", "exposure_no2", "age", "bmi"),
clustering_approach = "diana"
)
#> Starting clustering analysis...
#> Optimal number of clusters for samples: 1