Skip to contents

Data Acquisition

Download RNA-seq and clinical data from GDC and AWS S3

acquire_commpass_data()
Main data acquisition function
download_clinical_data()
Download clinical data from GDC
download_gdc_rnaseq()
Download RNA-seq data from GDC
download_s3_subset()
Download a Sample of RNA-seq Files from S3
get_commpass_clinical()
Query GDC for CoMMpass Clinical Data
list_s3_commpass()
List AWS S3 CoMMpass Bucket Contents
query_commpass_rna()
Query GDC for CoMMpass RNA-seq Metadata

Data Cleaning

Clean and integrate clinical and expression data

clean_clinical_data()
Clean Clinical Data
clean_expression_data()
Clean Expression Data
clean_treatment_data()
Clean and standardize treatment data
integrate_clinical_expression()
Create Integrated Dataset
summarize_treatment()
Summarize treatment lines per patient

Data Dictionary

Variable documentation and metadata

get_commpass_data_dictionary()
Get CoMMpass Data Dictionary
get_variable_docs()
Get Extended Documentation for a Variable

Quality Control

QC metrics, filtering, and normalization

calculate_qc_metrics()
Calculate QC metrics for RNA-seq data

Differential Expression

DESeq2, edgeR, limma-voom, consensus analysis, and DE visualization

compute_pca()
Compute PCA from transformed expression data
correlate_genes()
Correlate two genes across samples
correlate_genes_batch()
Batch correlation: one gene vs many
plot_gene_correlation()
Scatter plot of gene-gene correlation
plot_heatmap_de()
Heatmap of top DE genes
plot_ma()
MA plot of DE results
plot_pca()
PCA plot
plot_volcano()
Volcano plot of DE results
run_deseq2()
Run DESeq2 differential expression analysis
run_vst()
Run variance stabilizing transformation
summarize_de_methods()
Summarize DE results across methods

Survival Analysis

Kaplan-Meier, Cox PH, forest plots, marker-stratified survival

extract_risk_table()
Extract risk table from Kaplan-Meier results
plot_forest()
Plot forest plot of Cox hazard ratios
plot_km()
Plot Kaplan-Meier curve
prepare_survival_data()
Prepare survival data from clinical and cytogenetic data
run_cox_regression()
Run Cox proportional hazards regression
run_kaplan_meier()
Run Kaplan-Meier analysis
run_km_by_expression()
Run KM analysis stratified by gene expression level
run_km_by_markers()
Run KM analysis for each individual cytogenetic marker

Pathway Analysis

GSEA, ORA, MSigDB gene sets, gene annotation, enrichment visualization

annotate_de_results()
Add gene symbol column to DE results table
annotate_genes()
Annotate Ensembl gene IDs with symbols and descriptions
plot_enrichment_barplot()
Bar plot of enrichment results
plot_enrichment_dotplot()
Dot plot of enrichment results
plot_gsea_running_score()
GSEA running enrichment score plot
run_gsea()
Run Gene Set Enrichment Analysis (GSEA)
run_ora()
Run Over-Representation Analysis (ORA)
run_pathway_analysis()
Run pathway enrichment analysis

Cytogenetics

FISH/cytogenetic extraction, risk classification, oncoprint, co-occurrence

calculate_cooccurrence()
Calculate pairwise co-occurrence of cytogenetic alterations
compute_riss()
Compute Revised International Staging System (R-ISS)
extract_cytogenetic_data()
Extract cytogenetic markers from clinical data
plot_cooccurrence_heatmap()
Plot co-occurrence heatmap of cytogenetic alterations
plot_cytogenetic_oncoprint()
Plot cytogenetic oncoprint
plot_expression_by_subtype()
Plot gene expression by cytogenetic subtype
summarize_cytogenetics()
Summarize cytogenetic alteration frequencies

Causal Analysis

Causal DAGs, adjustment sets, and model adequacy checks

commpass_dag()
Define the CoMMpass causal DAG
get_adjustment_sets()
Get adjustment sets for a given analysis
plot_dag()
Plot the CoMMpass causal DAG
check_adjustment()
Compare model covariates against DAG-implied adjustments

Storage

DuckDB-backed parquet querying

get_commpass_tbl()
Get Lazy DuckDB Table for CoMMpass Data
query_commpass_parquet()
Query CoMMpass Parquet Files

API

Plumber API endpoints for programmatic data access

api_get_clinical()
Get clinical data for API response
api_get_de_results()
Get DE results for API response
api_get_pathways()
Get pathway analysis results for API response
api_get_survival()
Get survival data for API response
api_list_datasets()
List available API datasets
api_serve()
Launch the plumber API
generate_api_endpoint()
Generate a single API endpoint JSON string
generate_api_index()
Generate API index metadata

Utilities

Shared helpers for formatting and summaries

create_summary_table()
Create Summary Statistics Table
example_data()
Load example coMMpass datasets
export_h5ad()
Export SummarizedExperiment to H5AD (AnnData) format
format_file_size()
Format File Size in Human-Readable Format
format_with_commas()
Format Number with Thousands Separator
gene_report()
Render a single-gene characterization report
get_counts_assay()
Get counts assay from SummarizedExperiment
strip_plotly()
Strip plotly closure bloat for compact serialization

Internal

Internal helper functions (not exported)

check_dependencies()
Check package dependencies
create_project_dirs()
Create project directories
download_aws_data()
Download data from AWS S3 open access bucket
filter_low_quality()
Filter low-quality samples and genes
find_consensus_genes()
Find consensus DE genes across methods
generate_summary_report()
Generate summary report
normalize_rnaseq()
Normalize RNA-seq data
render_de_report()
Render DE analysis report
run_edger()
Run edgeR differential expression analysis
run_limma()
Run limma differential expression analysis
save_timestamped()
Save results with timestamp
setup_logging()
Setup logging
summarize_data()
Generate summary statistics