Skip to contents

Online documentation

This vignette displays pre-computed results. Run the targets pipeline locally for interactive analysis.

Definitions

Terms used throughout the CoMMpass analysis vignettes, grouped by category. See individual vignettes for detailed usage.

Show code
safe_tar_read("glossary_table")
#> <table>
#> <caption>CoMMpass glossary: 65 terms across 8 categories (Disease, Cytogenetics, Staging, RNA-seq, DE, Survival, Pathway, Infrastructure). Definition = concise explanation; Appears_In = vignettes where the term is used; See_Also = external references and cross-links to other vignettes. Searchable — use the filter boxes to find terms by category or keyword. Source: curated from GDC documentation, Bioconductor, and domain literature.</caption>
#>  <thead>
#>   <tr>
#>    <th style="text-align:left;"> Term </th>
#>    <th style="text-align:left;"> Category </th>
#>    <th style="text-align:left;"> Definition </th>
#>    <th style="text-align:left;"> Appears_In </th>
#>    <th style="text-align:left;"> See_Also </th>
#>   </tr>
#>  </thead>
#> <tbody>
#>   <tr>
#>    <td style="text-align:left;"> Multiple Myeloma </td>
#>    <td style="text-align:left;"> Disease &amp; Study </td>
#>    <td style="text-align:left;"> Cancer of plasma cells in the bone marrow. The most common indication for stem cell transplant in adults </td>
#>    <td style="text-align:left;"> survival (3), exploratory (2), data-sources (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Multiple_myeloma), [NCI](https://www.cancer.gov/types/myeloma) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> CoMMpass </td>
#>    <td style="text-align:left;"> Disease &amp; Study </td>
#>    <td style="text-align:left;"> Clinical Outcomes in Multiple Myeloma to Personal Assessment of Genetic Profile -- MMRF longitudinal study of ~1,143 newly diagnosed MM patients </td>
#>    <td style="text-align:left;"> data-sources (5), exploratory (3), survival (2), gene-report (2) </td>
#>    <td style="text-align:left;"> [MMRF](https://themmrf.org/finding-a-cure/personalized-treatment-approaches/), [GDC Portal](https://portal.gdc.cancer.gov/projects/MMRF-COMMPASS) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> GDC </td>
#>    <td style="text-align:left;"> Disease &amp; Study </td>
#>    <td style="text-align:left;"> Genomic Data Commons -- NCI repository hosting CoMMpass RNA-seq and clinical data </td>
#>    <td style="text-align:left;"> data-acquisition (8), data-sources (3), data-dictionary (2) </td>
#>    <td style="text-align:left;"> [GDC Portal](https://portal.gdc.cancer.gov/), [GDC Docs](https://docs.gdc.cancer.gov/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> MMRF </td>
#>    <td style="text-align:left;"> Disease &amp; Study </td>
#>    <td style="text-align:left;"> Multiple Myeloma Research Foundation -- sponsor of the CoMMpass trial </td>
#>    <td style="text-align:left;"> data-sources (3), data-acquisition (2) </td>
#>    <td style="text-align:left;"> [MMRF](https://themmrf.org/), [Data Sources](data-sources.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> TCGAbiolinks </td>
#>    <td style="text-align:left;"> Disease &amp; Study </td>
#>    <td style="text-align:left;"> Bioconductor R package for querying and downloading GDC data programmatically </td>
#>    <td style="text-align:left;"> data-acquisition (4), data-sources (2) </td>
#>    <td style="text-align:left;"> [Bioconductor](https://bioconductor.org/packages/TCGAbiolinks/), [Data Acquisition](data-acquisition.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> FISH </td>
#>    <td style="text-align:left;"> Cytogenetics &amp; Markers </td>
#>    <td style="text-align:left;"> Fluorescence in situ hybridization -- detects cytogenetic abnormalities like t(4;14), t(14;16), del(17p) </td>
#>    <td style="text-align:left;"> exploratory (6), survival (4), gene-report (3) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Fluorescence_in_situ_hybridization), [EDA vignette](exploratory-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Cytogenetic Risk </td>
#>    <td style="text-align:left;"> Cytogenetics &amp; Markers </td>
#>    <td style="text-align:left;"> Classification of patients into high-risk or standard-risk based on FISH-detected chromosomal abnormalities per IMWG criteria </td>
#>    <td style="text-align:left;"> survival (5), exploratory (4), gene-report (3) </td>
#>    <td style="text-align:left;"> [Survival vignette](survival-analysis.html), [IMWG](https://doi.org/10.1200/JCO.2014.55.1519) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> t(4;14) </td>
#>    <td style="text-align:left;"> Cytogenetics &amp; Markers </td>
#>    <td style="text-align:left;"> Translocation between chromosomes 4 and 14 -- associated with poor prognosis in MM. Detected by FISH </td>
#>    <td style="text-align:left;"> survival (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Chromosomal_translocation), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> t(14;16) </td>
#>    <td style="text-align:left;"> Cytogenetics &amp; Markers </td>
#>    <td style="text-align:left;"> Translocation between chromosomes 14 and 16 -- high-risk marker associated with aggressive disease </td>
#>    <td style="text-align:left;"> survival (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> del(17p) </td>
#>    <td style="text-align:left;"> Cytogenetics &amp; Markers </td>
#>    <td style="text-align:left;"> Deletion of the short arm of chromosome 17 -- loss of TP53 tumor suppressor, high-risk marker </td>
#>    <td style="text-align:left;"> survival (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Chromosome_17_(human)#Deletions), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> High-Risk / Standard-Risk </td>
#>    <td style="text-align:left;"> Cytogenetics &amp; Markers </td>
#>    <td style="text-align:left;"> High-risk: presence of t(4;14), t(14;16), or del(17p). Standard-risk: absence of all three. Per IMWG 2014 criteria </td>
#>    <td style="text-align:left;"> survival (4), exploratory (3) </td>
#>    <td style="text-align:left;"> [IMWG criteria](https://doi.org/10.1200/JCO.2014.55.1519), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> ISS </td>
#>    <td style="text-align:left;"> Staging &amp; Risk </td>
#>    <td style="text-align:left;"> International Staging System -- classifies myeloma severity (I-III) by serum albumin and beta-2 microglobulin (Greipp et al. 2005, doi:10.1200/JCO.2005.04.242) </td>
#>    <td style="text-align:left;"> survival (6), exploratory (5), data-dictionary (2) </td>
#>    <td style="text-align:left;"> [Greipp et al. 2005](https://doi.org/10.1200/JCO.2005.04.242), [EDA vignette](exploratory-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> IMWG </td>
#>    <td style="text-align:left;"> Staging &amp; Risk </td>
#>    <td style="text-align:left;"> International Myeloma Working Group -- defines cytogenetic risk criteria (Sonneveld et al. 2016, doi:10.1200/JCO.2014.55.1519) </td>
#>    <td style="text-align:left;"> survival (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [IMWG](https://doi.org/10.1200/JCO.2014.55.1519), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> B2M / Serum Albumin </td>
#>    <td style="text-align:left;"> Staging &amp; Risk </td>
#>    <td style="text-align:left;"> Beta-2 microglobulin and serum albumin -- the two biomarkers used to determine ISS stage. B2M &gt;= 5.5 mg/L = Stage III </td>
#>    <td style="text-align:left;"> exploratory (2), data-dictionary (2) </td>
#>    <td style="text-align:left;"> [ISS definition](https://doi.org/10.1200/JCO.2005.04.242), [Data Dictionary](data-dictionary.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> B2M </td>
#>    <td style="text-align:left;"> Staging &amp; Risk </td>
#>    <td style="text-align:left;"> Beta-2 microglobulin -- serum protein used in ISS staging. B2M &gt;= 5.5 mg/L indicates Stage III myeloma </td>
#>    <td style="text-align:left;"> survival (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Beta-2_microglobulin), [ISS staging](https://doi.org/10.1200/JCO.2005.04.242) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> RNA-seq </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> RNA sequencing -- high-throughput method to quantify gene expression. CoMMpass uses bulk RNA-seq from bone marrow aspirates </td>
#>    <td style="text-align:left;"> data-acquisition (6), exploratory (3), differential-expression (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/RNA-Seq), [Data Acquisition](data-acquisition.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Read count </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> The number of sequencing reads aligned to a gene in one sample. Raw integer counts are the input for DE methods (DESeq2, edgeR) </td>
#>    <td style="text-align:left;"> data-acquisition (4), differential-expression (3) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/RNA-Seq), [Data Acquisition](data-acquisition.html#rnaseq-data) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Library size </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Total number of sequencing reads mapped to genes in one sample (= sum of all gene counts). A proxy for sequencing depth. Synonym: Total Counts </td>
#>    <td style="text-align:left;"> data-acquisition (5), exploratory (3) </td>
#>    <td style="text-align:left;"> [Data Acquisition](data-acquisition.html#rnaseq-data), [Wikipedia](https://en.wikipedia.org/wiki/RNA-Seq#Analysis) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> TPM </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Transcripts per million -- normalized expression measure for cross-sample comparison. Accounts for gene length and sequencing depth </td>
#>    <td style="text-align:left;"> data-acquisition (3), data-dictionary (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Transcripts_per_million), [Data Dictionary](data-dictionary.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Gene detection </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> A gene is 'detected' in a sample if it has &gt;= 1 mapped read (count &gt; 0). The number of detected genes per sample is a QC metric; low detection suggests poor depth or degradation </td>
#>    <td style="text-align:left;"> data-acquisition (4), exploratory (2) </td>
#>    <td style="text-align:left;"> [Data Acquisition QC](data-acquisition.html#quality-control) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> MAD </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Median absolute deviation -- robust measure of spread. In QC context, MAD of gene counts within a single sample measures expression variability </td>
#>    <td style="text-align:left;"> data-acquisition (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Median_absolute_deviation), [Data Acquisition QC](data-acquisition.html#quality-control) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Outlier (QC) </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> A sample flagged as outlier if it falls in the bottom 5th percentile of library size OR genes detected. The flag is binary (Yes/No) </td>
#>    <td style="text-align:left;"> data-acquisition (3), exploratory (2) </td>
#>    <td style="text-align:left;"> [Data Acquisition QC](data-acquisition.html#quality-control) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> VST </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Variance-stabilizing transformation -- normalizes count data for visualization and clustering, reducing mean-variance dependence </td>
#>    <td style="text-align:left;"> differential-expression (4), exploratory (3), survival (2) </td>
#>    <td style="text-align:left;"> [DESeq2 docs](https://bioconductor.org/packages/DESeq2/), [DE vignette](differential-expression.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> STAR </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Spliced Transcripts Alignment to a Reference -- RNA-seq aligner used in GDC pipeline </td>
#>    <td style="text-align:left;"> data-acquisition (3), data-sources (2) </td>
#>    <td style="text-align:left;"> [GitHub](https://github.com/alexdobin/STAR), [GDC Pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> GENCODE </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Comprehensive gene annotation project providing reference gene models for genome analysis </td>
#>    <td style="text-align:left;"> data-acquisition (2), data-sources (1) </td>
#>    <td style="text-align:left;"> [gencodegenes.org](https://www.gencodegenes.org/), [Data Acquisition](data-acquisition.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Ensembl </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> Genome database providing gene IDs (ENSG*) used in RNA-seq quantification </td>
#>    <td style="text-align:left;"> data-acquisition (3), gene-report (2) </td>
#>    <td style="text-align:left;"> [ensembl.org](https://www.ensembl.org/), [Data Acquisition](data-acquisition.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> HGNC </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> HUGO Gene Nomenclature Committee -- authority for standardized human gene symbols </td>
#>    <td style="text-align:left;"> gene-report (3), differential-expression (2) </td>
#>    <td style="text-align:left;"> [genenames.org](https://www.genenames.org/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Entrez </td>
#>    <td style="text-align:left;"> RNA-seq &amp; QC </td>
#>    <td style="text-align:left;"> NCBI gene identifier system used for cross-database gene referencing </td>
#>    <td style="text-align:left;"> gene-report (2), data-acquisition (1) </td>
#>    <td style="text-align:left;"> [NCBI Gene](https://www.ncbi.nlm.nih.gov/gene/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> DE </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Differential expression -- genes with significantly different expression between conditions (e.g. high-risk vs standard-risk) </td>
#>    <td style="text-align:left;"> differential-expression (8), gene-report (5), survival (2) </td>
#>    <td style="text-align:left;"> [DE vignette](differential-expression.html), [Wikipedia](https://en.wikipedia.org/wiki/Differential_gene_expression) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> DESeq2 </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Bioconductor package for DE analysis using negative binomial GLMs with shrinkage estimation. The primary DE method in this pipeline </td>
#>    <td style="text-align:left;"> differential-expression (6), gene-report (3) </td>
#>    <td style="text-align:left;"> [Bioconductor](https://bioconductor.org/packages/DESeq2/), [PMID:25516281](https://pubmed.ncbi.nlm.nih.gov/25516281/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> edgeR </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Bioconductor package for DE analysis using empirical Bayes moderation of tagwise dispersions. Used as consensus method alongside DESeq2 </td>
#>    <td style="text-align:left;"> differential-expression (4), gene-report (2) </td>
#>    <td style="text-align:left;"> [Bioconductor](https://bioconductor.org/packages/edgeR/), [PMID:19910308](https://pubmed.ncbi.nlm.nih.gov/19910308/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> limma-voom </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Linear modelling framework (limma) with voom precision weights for RNA-seq. Third consensus DE method in pipeline </td>
#>    <td style="text-align:left;"> differential-expression (4), gene-report (2) </td>
#>    <td style="text-align:left;"> [Bioconductor](https://bioconductor.org/packages/limma/), [PMID:25605792](https://pubmed.ncbi.nlm.nih.gov/25605792/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Log2 Fold Change </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Log-base-2 ratio of expression between conditions. LFC &gt; 0 = upregulated; LFC &lt; 0 = downregulated. Shrinkage-corrected LFC used for ranking </td>
#>    <td style="text-align:left;"> differential-expression (5), gene-report (3) </td>
#>    <td style="text-align:left;"> [DE vignette](differential-expression.html), [Wikipedia](https://en.wikipedia.org/wiki/Fold_change) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> FDR / Adjusted p-value </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> False discovery rate -- p-values adjusted for multiple testing (Benjamini-Hochberg). FDR &lt; 0.05 is the standard significance threshold for DE </td>
#>    <td style="text-align:left;"> differential-expression (4), gene-report (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/False_discovery_rate), [DE vignette](differential-expression.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Volcano Plot </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Scatter plot of log2 fold change (x) vs -log10 adjusted p-value (y). Highlights significantly DE genes in the upper-left and upper-right quadrants </td>
#>    <td style="text-align:left;"> differential-expression (3), gene-report (2) </td>
#>    <td style="text-align:left;"> [DE vignette](differential-expression.html), [Wikipedia](https://en.wikipedia.org/wiki/Volcano_plot_(statistics)) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> PCA </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Principal component analysis -- dimensionality reduction technique for visualizing sample clustering and batch effects </td>
#>    <td style="text-align:left;"> exploratory (4), differential-expression (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Principal_component_analysis), [EDA vignette](exploratory-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> LFC </td>
#>    <td style="text-align:left;"> Differential Expression </td>
#>    <td style="text-align:left;"> Log2 fold change -- effect size measure for differential expression. Positive = upregulated; negative = downregulated </td>
#>    <td style="text-align:left;"> differential-expression (5), gene-report (3) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Fold_change), [DE vignette](differential-expression.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Overall Survival (OS) </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Time from diagnosis to death from any cause. The primary survival endpoint in CoMMpass </td>
#>    <td style="text-align:left;"> survival (8), exploratory (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Overall_survival), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Progression-Free Survival (PFS) </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Time from diagnosis to disease progression or death. A secondary endpoint capturing earlier clinical events </td>
#>    <td style="text-align:left;"> survival (4) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Progression-free_survival), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> KM </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Kaplan-Meier -- non-parametric survival curve estimator. Handles right-censored data (patients still alive at last follow-up) </td>
#>    <td style="text-align:left;"> survival (6), exploratory (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Log-Rank Test </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Non-parametric test comparing survival distributions between groups. Tests whether KM curves differ significantly (p &lt; 0.05) </td>
#>    <td style="text-align:left;"> survival (4) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Logrank_test), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Cox PH </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Cox proportional hazards -- semi-parametric regression for survival data. Estimates hazard ratios adjusting for covariates </td>
#>    <td style="text-align:left;"> survival (5), exploratory (1) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Proportional_hazards_model), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> HR </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Hazard ratio -- relative risk of event occurrence. HR &gt; 1 = increased risk (worse survival); HR &lt; 1 = protective effect </td>
#>    <td style="text-align:left;"> survival (6), gene-report (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Hazard_ratio), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Forest Plot </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Visual display of hazard ratios with confidence intervals for multiple covariates from a Cox model. Each row is a covariate; dashed line at HR = 1 </td>
#>    <td style="text-align:left;"> survival (3) </td>
#>    <td style="text-align:left;"> [Survival vignette](survival-analysis.html#forest-plot) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Confidence Interval </td>
#>    <td style="text-align:left;"> Survival Analysis </td>
#>    <td style="text-align:left;"> Range of plausible values for an estimate (typically 95%). For HRs, a CI crossing 1.0 indicates non-significance </td>
#>    <td style="text-align:left;"> survival (4) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Confidence_interval), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> GSEA </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Gene Set Enrichment Analysis -- tests whether predefined gene sets are enriched at the top or bottom of a ranked gene list. Uses all genes, not just significant ones </td>
#>    <td style="text-align:left;"> gene-report (6), differential-expression (3) </td>
#>    <td style="text-align:left;"> [GSEA](https://www.gsea-msigdb.org/gsea/), [PMID:16199517](https://pubmed.ncbi.nlm.nih.gov/16199517/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> ORA </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Over-Representation Analysis -- tests whether a set of DE genes contains more members of a pathway than expected by chance (Fisher's exact test) </td>
#>    <td style="text-align:left;"> gene-report (4), differential-expression (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Gene_set_enrichment_analysis#Over-representation_analysis), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> MSigDB </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Molecular Signatures Database -- curated collection of gene sets for GSEA/ORA. Categories include Hallmark, GO, KEGG, Reactome </td>
#>    <td style="text-align:left;"> gene-report (5), differential-expression (2) </td>
#>    <td style="text-align:left;"> [MSigDB](https://www.gsea-msigdb.org/gsea/msigdb/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Hallmark Gene Sets </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> MSigDB Hallmark collection -- 50 curated gene sets representing well-defined biological states and processes (e.g. EMT, p53 pathway, hypoxia) </td>
#>    <td style="text-align:left;"> gene-report (4) </td>
#>    <td style="text-align:left;"> [MSigDB Hallmarks](https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp#H), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Pathway </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> A set of genes involved in a common biological process (e.g. cell cycle, apoptosis). Annotated in databases like KEGG, Reactome, GO </td>
#>    <td style="text-align:left;"> gene-report (5), differential-expression (3) </td>
#>    <td style="text-align:left;"> [KEGG](https://www.genome.jp/kegg/pathway.html), [Reactome](https://reactome.org/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Gene Set </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Any defined collection of genes tested together in enrichment analysis (broader than 'pathway' -- includes GO terms, TF targets, etc.) </td>
#>    <td style="text-align:left;"> gene-report (4), differential-expression (2) </td>
#>    <td style="text-align:left;"> [Gene Report](gene-report.html), [MSigDB](https://www.gsea-msigdb.org/gsea/msigdb/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> GO </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Gene Ontology -- structured vocabulary of gene/protein functions (Biological Process, Molecular Function, Cellular Component) </td>
#>    <td style="text-align:left;"> gene-report (3), differential-expression (2) </td>
#>    <td style="text-align:left;"> [geneontology.org](http://geneontology.org/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> KEGG </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Kyoto Encyclopedia of Genes and Genomes -- pathway and molecular interaction database </td>
#>    <td style="text-align:left;"> gene-report (3), differential-expression (2) </td>
#>    <td style="text-align:left;"> [genome.jp/kegg](https://www.genome.jp/kegg/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Reactome </td>
#>    <td style="text-align:left;"> Pathway &amp; Enrichment </td>
#>    <td style="text-align:left;"> Open-source curated pathway database of biological reactions and processes </td>
#>    <td style="text-align:left;"> gene-report (2), differential-expression (1) </td>
#>    <td style="text-align:left;"> [reactome.org](https://reactome.org/), [Gene Report](gene-report.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Targets </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> R-based pipeline tool (targets package) for reproducible, cached computation. Each analysis step is a 'target' with dependency tracking </td>
#>    <td style="text-align:left;"> pipeline-dag (6), telemetry (4), data-sources (2) </td>
#>    <td style="text-align:left;"> [targets package](https://docs.ropensci.org/targets/), [Pipeline DAG](pipeline-dag.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> sample_limit </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Pipeline parameter controlling the number of patient samples included. Default: local=200, CI=20. Lower values speed up development; higher values improve statistical power </td>
#>    <td style="text-align:left;"> data-sources (3), survival (2), exploratory (2) </td>
#>    <td style="text-align:left;"> [Data Sources](data-sources.html), [Survival vignette](survival-analysis.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Nix </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Reproducible build system providing isolated, version-pinned R environments via nixpkgs. Ensures all collaborators use identical package versions </td>
#>    <td style="text-align:left;"> data-sources (2) </td>
#>    <td style="text-align:left;"> [Nix](https://nixos.org/), [rix package](https://docs.ropensci.org/rix/) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> pkgdown </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> R package for building package documentation websites. Renders vignettes, function reference, and news into a static site hosted on GitHub Pages </td>
#>    <td style="text-align:left;"> data-sources (2) </td>
#>    <td style="text-align:left;"> [pkgdown](https://pkgdown.r-lib.org/), [Data Sources](data-sources.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> API </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Application Programming Interface -- structured endpoints for programmatic data access </td>
#>    <td style="text-align:left;"> api-usage (6), data-sources (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/API), [API Usage](api-usage.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> DAG </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Directed acyclic graph -- dependency structure used by targets pipeline for reproducible execution </td>
#>    <td style="text-align:left;"> pipeline-dag (5), telemetry (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Directed_acyclic_graph), [Pipeline DAG](pipeline-dag.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> DuckDB </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> In-process analytical database used for efficient parquet querying in the pipeline </td>
#>    <td style="text-align:left;"> data-sources (3), data-dictionary (2) </td>
#>    <td style="text-align:left;"> [duckdb.org](https://duckdb.org/), [Data Sources](data-sources.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Parquet </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Columnar storage format used for efficient data storage and querying in the pipeline </td>
#>    <td style="text-align:left;"> data-sources (3), api-usage (2) </td>
#>    <td style="text-align:left;"> [parquet.apache.org](https://parquet.apache.org/), [Data Sources](data-sources.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> CI </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Continuous integration -- automated testing and deployment via GitHub Actions </td>
#>    <td style="text-align:left;"> telemetry (3), pipeline-dag (2) </td>
#>    <td style="text-align:left;"> [Wikipedia](https://en.wikipedia.org/wiki/Continuous_integration), [Telemetry](telemetry.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> CRAN </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Comprehensive R Archive Network -- primary repository for R packages </td>
#>    <td style="text-align:left;"> data-sources (1) </td>
#>    <td style="text-align:left;"> [cran.r-project.org](https://cran.r-project.org/), [Data Sources](data-sources.html) </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> Bioconductor </td>
#>    <td style="text-align:left;"> Data Infrastructure </td>
#>    <td style="text-align:left;"> Open-source software project for genomics and bioinformatics R packages </td>
#>    <td style="text-align:left;"> data-acquisition (4), differential-expression (3) </td>
#>    <td style="text-align:left;"> [bioconductor.org](https://www.bioconductor.org/), [Data Acquisition](data-acquisition.html) </td>
#>   </tr>
#> </tbody>
#> </table>

Units Reference

Age and time variables are stored in DAYS by GDC. This is the single most common source of errors in GDC analyses. See the data dictionary for all variable definitions.

Unit reference for 7 key variables. All time/age variables are in DAYS (GDC convention); divide by 365.25 for years. Expression units: raw counts for DE analysis (DESeq2/edgeR), TPM for cross-sample comparison, FPKM is deprecated. Source: GDC data model. See data-dictionary for full variable definitions.
Variable Unit Conversion_or_Notes
age_at_diagnosis days age_years = age_at_diagnosis / 365.25
days_to_death days Used directly in survival analysis
days_to_last_follow_up days Censoring time for survival
days_to_last_known_disease_status days Disease assessment time
unstranded / stranded_* raw integer counts Input for DESeq2/edgeR
tpm_unstranded TPM Cross-sample comparison (transcripts per million)
fpkm_unstranded FPKM Largely deprecated, use TPM

Recent Changes

Recent project commits with lines added, files changed, and change categories.

Last 20 project commits with change statistics. Date = commit date; Type = conventional-commit prefix (feat/fix/docs/ci/refactor/test/chore). Files = number of files modified; +Lines/-Lines = lines added/removed. Source: git log –numstat. See changes-by-type table for aggregate breakdown.
date type summary n_files lines_added lines_removed file_categories
2026-03-14 Bug Fix fix(pipeline): Fix 11 NULL targets — DE condition, ID matching, consensus type 41 146 47 Other, R Source
2026-03-14 Bug Fix fix(cachix): Remove –watch-mode auto flag (already default) 1 1 1 Other
2026-03-14 Bug Fix fix(pipeline): Fix 3 NULL-target bugs, auto-generate package.nix (#93) 87 235 80 Config, Docs, Other, R Source
2026-03-14 Bug Fix fix(nix): Fix cachix signing key, rebuild Bioconductor-dependent targets 2 0 0 Other
2026-03-14 New Feature feat(captions): Add dynamic captions to 34 table/plot targets 22 579 89 Other, R Source
2026-03-14 Bug Fix fix(vignettes): Enforce zero-computation rule — 22 violations → 0 32 360 764 Other, R Source, Vignettes
2026-03-13 Bug Fix fix(vignettes): Convert kable RDS to data.frames, fix telemetry eval guards 18 8 2 Other, Vignettes
2026-03-13 Bug Fix fix(ci): Save data frames (not DT widgets) to RDS for Nix portability 3 0 0 Other
2026-03-13 Bug Fix fix(vignettes): Use Quarto #| eval syntax for pkgdown-banner chunks 11 44 11 Vignettes
2026-03-13 Refactoring refactor(targets): Move Bioconductor packages to per-target declarations 11 35 17 Other, R Source, Vignettes
2026-03-13 New Feature feat(vignettes): Add code provenance, kable→DT conversion, caption compliance 35 1004 437 CI/CD, Other, R Source, Vignettes
2026-03-13 Bug Fix fix(vignettes): Skip NULL RDS in safe_tar_read, return invisible(NULL) 11 22 22 Vignettes
2026-03-13 Bug Fix fix(glossary): Prevent double DT::datatable() wrapping in glossary-table chunk 1 3 1 Vignettes
2026-03-13 CI/CD ci: Show quarto errors with quiet=FALSE, render individual vignettes in diagnostic 1 20 6 CI/CD
2026-03-13 CI/CD ci: Add verbose quarto error diagnostics on build failure 1 14 1 CI/CD
2026-03-13 Bug Fix fix(vignettes): Strip Nix paths from DT widgets, auto-wrap data frames 25 66 28 CI/CD, Other, Vignettes
2026-03-13 CI/CD ci: Add diagnostic quarto render step to debug build failure 1 17 0 CI/CD
2026-03-13 Bug Fix fix(vignettes): Revert safe_tar_read placeholder, guard gene-report 11 12 56 Vignettes
2026-03-13 Maintenance chore: Export vig_count_distribution_plot as ggplot RDS (513KB) 1 0 0 Other
2026-03-13 Bug Fix fix(vignettes): Enable code eval in CI with RDS fallback 80 113 74 CI/CD, Other, R Source, Vignettes

Reproducibility

Session Info (click to expand)
Show code
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.7.2         cli_3.6.5           knitr_1.51         
#>  [4] rlang_1.1.7         xfun_0.57           otel_0.2.0         
#>  [7] processx_3.8.6      targets_1.12.0      jsonlite_2.0.0     
#> [10] data.table_1.18.2.1 glue_1.8.0          prettyunits_1.2.0  
#> [13] backports_1.5.0     htmltools_0.5.9     ps_1.9.1           
#> [16] rmarkdown_2.30      tibble_3.3.1        evaluate_1.0.5     
#> [19] base64url_1.4       fastmap_1.2.0       yaml_2.3.12        
#> [22] lifecycle_1.0.5     compiler_4.5.3      codetools_0.2-20   
#> [25] igraph_2.2.2        pkgconfig_2.0.3     digest_0.6.39      
#> [28] R6_2.6.1            tidyselect_1.2.1    pillar_1.11.1      
#> [31] callr_3.7.6         magrittr_2.0.4      withr_3.0.2        
#> [34] tools_4.5.3         secretbase_1.2.0