Load example coMMpass datasets

Returns small **synthetic** datasets (no real patient data) for interactive exploration and testing. The data exercises the full pipeline: QC, cleaning, survival analysis, and cytogenetic classification.

Usage

example_data()

Value

A named list with three elements:

rnaseq_se: A [SummarizedExperiment::SummarizedExperiment] with 50 genes x 20 samples. Assay named `"unstranded"` (matches GDC format). rowData contains `gene_id` (Ensembl-format with version suffix) and `gene_type`. colData contains `submitter_id` and `sample_type`.
clinical: A data.frame (20 patients) with columns `submitter_id`, `vital_status`, `days_to_death`, `days_to_last_follow_up`, `age_at_diagnosis` (in days), `gender`, `iss_stage`, `heavy_chain` (myeloma isotype), `light_chain` (Kappa/Lambda), `ecog_status` (0-4), `ldh` (U/L), `b2m` (beta-2 microglobulin, mg/L), `albumin` (g/dL), `flc_kappa` (free kappa light chain, mg/L), `flc_lambda` (free lambda light chain, mg/L), `hemoglobin` (g/dL), `creatinine` (mg/dL), `calcium` (corrected, mg/dL), `platelets` (10^9/L).
cytogenetic: A data.frame (20 patients) with columns `patient_id`, `t_4_14`, `t_11_14`, `t_14_16`, `del_17p`, `gain_1q`, `risk_group`.
treatment: A data.frame of treatment lines (1-3 per patient) with columns `patient_id`, `treatment_line`, `regimen_name`, `regimen_class`, `best_response` (ordered factor), `stem_cell_transplant` (logical), `treatment_start_days` (integer).

Details

The SummarizedExperiment is reconstructed at load time from stored plain-R components (matrix + data.frames), so the RDS files have no S4 class dependency.

Examples

d <- example_data()
# QC metrics
qc <- calculate_qc_metrics(d$rnaseq_se)
#> INFO [2026-03-21 19:01:49] Calculating QC metrics...
#> INFO [2026-03-21 19:01:49] QC metrics calculated for 20 samples
#> INFO [2026-03-21 19:01:49] 2 potential outliers detected
head(qc)
#>                            sample total_counts detected_genes median_count
#> TCGA-AB-0001-01A TCGA-AB-0001-01A         5648             41         23.0
#> TCGA-AB-0002-01A TCGA-AB-0002-01A         5552             42         13.0
#> TCGA-AB-0003-01A TCGA-AB-0003-01A         6140             40         12.0
#> TCGA-AB-0004-01A TCGA-AB-0004-01A         5374             41         20.5
#> TCGA-AB-0005-01A TCGA-AB-0005-01A         4133             42         19.0
#> TCGA-AB-0006-01A TCGA-AB-0006-01A         6451             43         20.0
#>                  mad_count size_factor is_outlier
#> TCGA-AB-0001-01A   31.8759   1.0017737      FALSE
#> TCGA-AB-0002-01A   19.2738   0.9847464      FALSE
#> TCGA-AB-0003-01A   17.7912   1.0890387       TRUE
#> TCGA-AB-0004-01A   27.4281   0.9531749      FALSE
#> TCGA-AB-0005-01A   28.1694   0.7330614       TRUE
#> TCGA-AB-0006-01A   28.9107   1.1442001      FALSE

# Clinical cleaning
clin <- clean_clinical_data(d$clinical)

# Survival data with cytogenetic markers
surv <- prepare_survival_data(d$clinical, cyto_file = d$cytogenetic)
#> INFO [2026-03-21 19:01:49] Preparing survival data from 20 patients
#> INFO [2026-03-21 19:01:49] Merged cytogenetic data for 20 patients
#> INFO [2026-03-21 19:01:49] Survival data prepared: 20 patients, 10 events, 10 censored

Usage

Value

Details

See also

Examples