Reproducible analysis of the MMRF CoMMpass (Clinical Outcomes in MM to Personal Assessment of Genetic Profile) multiple myeloma study data using R and Nix.
Overview
The CoMMpass study is a landmark longitudinal genomic-clinical study of 1,143 newly diagnosed multiple myeloma patients collected between 2011-2016 with 8-year follow-up. This project provides:
- Reproducible Nix environment with all required R/Bioconductor packages
- Analysis workflows for RNA-seq, single-cell, and survival analysis
- Direct access to CoMMpass data via GDC API and AWS S3 (open access)
Usage
Quick Start (without Nix)
# Install from GitHub (requires Bioconductor deps)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("TCGAbiolinks", "DESeq2", "edgeR", "limma",
"SummarizedExperiment", "fgsea"))
remotes::install_github("JohnGavin/coMMpass-analysis")
# Clone the repo (targets needs _targets.R in working directory)
# git clone https://github.com/JohnGavin/coMMpass-analysis.git
# setwd("coMMpass-analysis")
# Run the pipeline
library(targets)
tar_make()Pipeline status
The pipeline has 244 targets (total build time: 421s).
Top 5 targets by build time:
| name | seconds | MB | |
|---|---|---|---|
| 68 | raw_rnaseq | 113.2 | 60.5 |
| 166 | deseq2_results | 50.0 | 2.7 |
| 36 | clinical_data | 38.4 | 1.2 |
| 239 | bayes_cox_basic | 32.0 | 2.3 |
| 165 | deseq2_paired_results | 26.1 | 2.1 |
Example: read a result
km <- tar_read(km_overall)
str(km, max.level = 1)
#> List of 8
#> $ n_per_group : 994 patients
#> $ median_survival: NA (not reached)
#> $ fit : survfit object
#> $ data : data.frame with 994 obs. of 14 variables
#> $ formula : survival::Surv(time_days, status) ~ 1Note: The pipeline downloads ~3.6 GB of RNA-seq data from GDC on first run. Subsequent runs skip downloads (
cue = "never"). Theconfigtarget setssample_limit = 200by default — editR/tar_plans/plan_data_acquisition.Rto change this.
What Happens
-
default.shchecks ifdefault_dev.nixneeds regeneration - Runs
default.Rwhich:- Extracts package dependencies from
DESCRIPTION(single source of truth) - Calls
rix::rix()to generatedefault_dev.nix
- Extracts package dependencies from
- Builds the Nix environment with
nix-build - Creates a GC root symlink (prevents garbage collection)
- Enters an interactive shell with all packages
Data Access
| Access Level | Data Type | Requirements | Used by this pipeline |
|---|---|---|---|
| GDC Data Portal | Clinical, RNA-seq (995 cases) | None (open access) | Yes |
| AWS Open Data | RNA-seq gene expression | None (AWS CLI) | Yes |
| MMRF Researcher Gateway | FISH, PFS, treatment response | Free registration (pending) | No (12 targets blocked) |
| dbGaP Controlled | Raw sequences (BAM/FASTQ) | Institutional IRB | No |
Project Structure
R/ # Package functions & pipeline plans
R/tar_plans/ # Modular targets plans
R/viz/ # Visualization functions
vignettes/ # Analysis vignettes (pkgdown articles)
inst/extdata/vignettes/ # Pre-computed RDS for CI
tests/testthat/ # Unit + snapshot tests
data/ # Downloaded data (gitignored)
default.sh # Nix environment setup
default.R # rix configuration
README.qmd # README source (generates README.md)92 R source files, 13 vignettes, 156 pre-computed RDS, 22 test files.
Key References
- Keats JJ et al. Blood 2013 — CoMMpass interim analysis
- Nature Genetics 2024 — Molecular profiling of 1,143 patients
- Nature Cancer 2025 — Bone marrow immune cell atlas
Glossary
See the Glossary vignette for definitions of terms used throughout this project.