Skip to contents

Online documentation

This vignette displays pre-computed results. Run the targets pipeline locally for interactive analysis.

Overview

This vignette documents all data sources used in the coMMpass analysis pipeline, their access tiers, and the distinction between pipeline data (real patient data from GDC) and synthetic test data (generated by example_data()).

MMRF CoMMpass Study

The Multiple Myeloma Research Foundation (MMRF) Relating Clinical Outcomes in Multiple Myeloma to Personal Assessment of Genetic Profile (CoMMpass) study is a longitudinal observational study of ~1,143 newly diagnosed multiple myeloma patients.

Citation

Keats JJ, et al. Interim Analysis Of The Mmrf CoMMpass Trial, a Longitudinal Study In Multiple Myeloma Relating Clinical Outcomes to Genomic and Immunophenotypic Profiles. Blood. 2013;122(21):532. doi:10.1182/blood.V122.21.532.532

Data at a Glance

Summary of all datasets used by the coMMpass pipeline. Rows = records (patients/samples/treatment lines). Columns = variables per record. All data sourced from GDC open-access endpoints; MMRF Gateway data (FISH, PFS, response) not yet available.
Dataset Rows Columns Completeness Source
Clinical metadata 995 88 ISS: 100% GDC API (open)
Treatment records 7184 5 994 patients GDC API (open)
RNA-seq (STAR counts) 100 60660 100% GDC STAR-Counts
MSigDB gene sets 708 2 Hallmark + KEGG Pre-bundled parquet

Data Access Tiers

Tier Data Access Used by pipeline
Open Access RNA-seq counts (STAR), clinical metadata, treatment records GDC Data Portal, no login required Yes
Open Access (S3) RNA-seq files s3://gdc-mmrf-commpass-phs000748-2-open/ Yes
MMRF Gateway FISH/cytogenetics, PFS, treatment response Free registration (pending access) No — 12 targets blocked
Controlled Access WGS, WES, protected clinical dbGaP application (phs000748) No — requires institutional IRB

Pipeline Data Sources (Real)

The targets pipeline downloads real patient data from GDC:

Data Source Function
RNA-seq SummarizedExperiment GDC STAR-Counts download_rnaseq_data() via TCGAbiolinks::GDCdownload()
Clinical metadata (demographics, ISS, vital status) GDC clinical endpoint download_clinical_data() via TCGAbiolinks::GDCquery_clinic()
Treatment records (7,184 records, 994 patients) GDC API download_clinical_data() via GDC REST API
Biospecimen metadata GDC biospecimen endpoint download_clinical_data() via TCGAbiolinks::GDCquery_clinic()
MSigDB gene sets (Hallmark + KEGG) MSigDB Pre-bundled parquet in inst/extdata/msigdb/

The pipeline runs with a configurable sample_limit parameter (default 200 in local builds; CI uses 20). All vignettes load pre-computed results from the targets store.

GDC Documentation

Synthetic Test Data (example_data())

The example_data() function returns small synthetic datasets for testing and documentation examples. These are randomly generated with set.seed(42) and contain no real patient data.

Dataset Size Description
rnaseq_se 50 genes x 20 samples SummarizedExperiment with simulated counts
clinical 20 patients Simulated demographics and outcomes
cytogenetic 20 patients Simulated FISH markers and risk groups
treatment ~40 rows Simulated treatment lines (1-3 per patient)

Stored in inst/extdata/example/*.rds and created by create_all_example_data().

The two data tracks do not mix: pipeline vignettes use real GDC data via tar_read(); unit tests and ?example_data examples use synthetic data only.

MSigDB Gene Sets

Pre-bundled MSigDB Hallmark gene sets are stored as parquet files in inst/extdata/msigdb/. These are public reference gene sets used for pathway enrichment analysis.

Bundled Files

MSigDB gene set collections bundled in inst/extdata/msigdb/ (2 parquet files). Total: 708 gene sets across 4,393 unique Ensembl gene IDs. Gene_Sets = distinct pathways per collection; Unique_Genes = genes annotated in that collection. Source: MSigDB Hallmark + KEGG via msigdbr. See differential-expression GSEA/ORA sections for enrichment results.
File Size Rows Columns Gene_Sets Unique_Genes ID_Type
hallmark_ensembl.parquet 41.8 KB 7333 2 50 4393 Ensembl
kegg_ensembl.parquet 43 KB 9688 2 658 2795 Ensembl

Citation

Liberzon A, et al. The Molecular Signatures Database (MSigDB) Hallmark Gene Set Collection. Cell Systems. 2015;1(6):417-425. doi:10.1016/j.cels.2015.12.004

Recent Changes

Recent project commits with lines added, files changed, and change categories.

Last 20 project commits with change statistics. Date = commit date; Type = conventional-commit prefix (feat/fix/docs/ci/refactor/test/chore). Files = number of files modified; +Lines/-Lines = lines added/removed. Source: git log –numstat. See changes-by-type table for aggregate breakdown.
date type summary n_files lines_added lines_removed file_categories
2026-03-14 Bug Fix fix(pipeline): Fix 11 NULL targets — DE condition, ID matching, consensus type 41 146 47 Other, R Source
2026-03-14 Bug Fix fix(cachix): Remove –watch-mode auto flag (already default) 1 1 1 Other
2026-03-14 Bug Fix fix(pipeline): Fix 3 NULL-target bugs, auto-generate package.nix (#93) 87 235 80 Config, Docs, Other, R Source
2026-03-14 Bug Fix fix(nix): Fix cachix signing key, rebuild Bioconductor-dependent targets 2 0 0 Other
2026-03-14 New Feature feat(captions): Add dynamic captions to 34 table/plot targets 22 579 89 Other, R Source
2026-03-14 Bug Fix fix(vignettes): Enforce zero-computation rule — 22 violations → 0 32 360 764 Other, R Source, Vignettes
2026-03-13 Bug Fix fix(vignettes): Convert kable RDS to data.frames, fix telemetry eval guards 18 8 2 Other, Vignettes
2026-03-13 Bug Fix fix(ci): Save data frames (not DT widgets) to RDS for Nix portability 3 0 0 Other
2026-03-13 Bug Fix fix(vignettes): Use Quarto #| eval syntax for pkgdown-banner chunks 11 44 11 Vignettes
2026-03-13 Refactoring refactor(targets): Move Bioconductor packages to per-target declarations 11 35 17 Other, R Source, Vignettes
2026-03-13 New Feature feat(vignettes): Add code provenance, kable→DT conversion, caption compliance 35 1004 437 CI/CD, Other, R Source, Vignettes
2026-03-13 Bug Fix fix(vignettes): Skip NULL RDS in safe_tar_read, return invisible(NULL) 11 22 22 Vignettes
2026-03-13 Bug Fix fix(glossary): Prevent double DT::datatable() wrapping in glossary-table chunk 1 3 1 Vignettes
2026-03-13 CI/CD ci: Show quarto errors with quiet=FALSE, render individual vignettes in diagnostic 1 20 6 CI/CD
2026-03-13 CI/CD ci: Add verbose quarto error diagnostics on build failure 1 14 1 CI/CD
2026-03-13 Bug Fix fix(vignettes): Strip Nix paths from DT widgets, auto-wrap data frames 25 66 28 CI/CD, Other, Vignettes
2026-03-13 CI/CD ci: Add diagnostic quarto render step to debug build failure 1 17 0 CI/CD
2026-03-13 Bug Fix fix(vignettes): Revert safe_tar_read placeholder, guard gene-report 11 12 56 Vignettes
2026-03-13 Maintenance chore: Export vig_count_distribution_plot as ggplot RDS (513KB) 1 0 0 Other
2026-03-13 Bug Fix fix(vignettes): Enable code eval in CI with RDS fallback 80 113 74 CI/CD, Other, R Source, Vignettes

Reproducibility

Session Info (click to expand)
Show code
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.7.2         cli_3.6.5           knitr_1.51         
#>  [4] rlang_1.1.7         xfun_0.57           otel_0.2.0         
#>  [7] processx_3.8.6      targets_1.12.0      jsonlite_2.0.0     
#> [10] data.table_1.18.2.1 glue_1.8.0          prettyunits_1.2.0  
#> [13] backports_1.5.0     htmltools_0.5.9     ps_1.9.1           
#> [16] rmarkdown_2.30      tibble_3.3.1        evaluate_1.0.5     
#> [19] base64url_1.4       fastmap_1.2.0       yaml_2.3.12        
#> [22] lifecycle_1.0.5     compiler_4.5.3      codetools_0.2-20   
#> [25] igraph_2.2.2        pkgconfig_2.0.3     digest_0.6.39      
#> [28] R6_2.6.1            tidyselect_1.2.1    pillar_1.11.1      
#> [31] callr_3.7.6         magrittr_2.0.4      withr_3.0.2        
#> [34] tools_4.5.3         secretbase_1.2.0