Returns small **synthetic** datasets (no real patient data) for interactive exploration and testing. The data exercises the full pipeline: QC, cleaning, survival analysis, and cytogenetic classification.
Value
A named list with three elements:
- rnaseq_se
A [SummarizedExperiment::SummarizedExperiment] with 50 genes x 20 samples. Assay named `"unstranded"` (matches GDC format). rowData contains `gene_id` (Ensembl-format with version suffix) and `gene_type`. colData contains `submitter_id` and `sample_type`.
- clinical
A data.frame (20 patients) with columns `submitter_id`, `vital_status`, `days_to_death`, `days_to_last_follow_up`, `age_at_diagnosis` (in days), `gender`, `iss_stage`, `heavy_chain` (myeloma isotype), `light_chain` (Kappa/Lambda), `ecog_status` (0-4), `ldh` (U/L), `b2m` (beta-2 microglobulin, mg/L), `albumin` (g/dL), `flc_kappa` (free kappa light chain, mg/L), `flc_lambda` (free lambda light chain, mg/L), `hemoglobin` (g/dL), `creatinine` (mg/dL), `calcium` (corrected, mg/dL), `platelets` (10^9/L).
- cytogenetic
A data.frame (20 patients) with columns `patient_id`, `t_4_14`, `t_11_14`, `t_14_16`, `del_17p`, `gain_1q`, `risk_group`.
- treatment
A data.frame of treatment lines (1-3 per patient) with columns `patient_id`, `treatment_line`, `regimen_name`, `regimen_class`, `best_response` (ordered factor), `stem_cell_transplant` (logical), `treatment_start_days` (integer).
Details
The SummarizedExperiment is reconstructed at load time from stored plain-R components (matrix + data.frames), so the RDS files have no S4 class dependency.
See also
Other utilities:
create_summary_table(),
export_h5ad(),
format_file_size(),
format_with_commas(),
gene_report(),
strip_plotly()
Examples
d <- example_data()
# QC metrics
qc <- calculate_qc_metrics(d$rnaseq_se)
#> INFO [2026-03-21 19:01:49] Calculating QC metrics...
#> INFO [2026-03-21 19:01:49] QC metrics calculated for 20 samples
#> INFO [2026-03-21 19:01:49] 2 potential outliers detected
head(qc)
#> sample total_counts detected_genes median_count
#> TCGA-AB-0001-01A TCGA-AB-0001-01A 5648 41 23.0
#> TCGA-AB-0002-01A TCGA-AB-0002-01A 5552 42 13.0
#> TCGA-AB-0003-01A TCGA-AB-0003-01A 6140 40 12.0
#> TCGA-AB-0004-01A TCGA-AB-0004-01A 5374 41 20.5
#> TCGA-AB-0005-01A TCGA-AB-0005-01A 4133 42 19.0
#> TCGA-AB-0006-01A TCGA-AB-0006-01A 6451 43 20.0
#> mad_count size_factor is_outlier
#> TCGA-AB-0001-01A 31.8759 1.0017737 FALSE
#> TCGA-AB-0002-01A 19.2738 0.9847464 FALSE
#> TCGA-AB-0003-01A 17.7912 1.0890387 TRUE
#> TCGA-AB-0004-01A 27.4281 0.9531749 FALSE
#> TCGA-AB-0005-01A 28.1694 0.7330614 TRUE
#> TCGA-AB-0006-01A 28.9107 1.1442001 FALSE
# Clinical cleaning
clin <- clean_clinical_data(d$clinical)
# Survival data with cytogenetic markers
surv <- prepare_survival_data(d$clinical, cyto_file = d$cytogenetic)
#> INFO [2026-03-21 19:01:49] Preparing survival data from 20 patients
#> INFO [2026-03-21 19:01:49] Merged cytogenetic data for 20 patients
#> INFO [2026-03-21 19:01:49] Survival data prepared: 20 patients, 10 events, 10 censored